caffe 学习(3)——Layer Catalogue

layer是建模和计算的基本单元。

caffe的目录包含各种state-of-the-art model的layers。

为了创建一个caffe model，我们需要定义模型架构在一个protocol buffer定义文件中(prototxt)。caffe的layer和它们的参数被定义在caffe.proto中。

Vision Layers:

头文件./include/caffe/vision_layers.hpp

vision layers通常取图像为输入，产生其他图像作为输出。实际中典型的图像可能只有一个颜色通道(c = 1)，例如在一个灰度图像中，或者三个通道(c = 3)，在一个RGB图像中。但是这里，一个图像的显著特征是它的空间结构，通常一个图像有高度h > 1，宽度w > 1。这个2D几何图形自然导致了如何处理输入。特别地，大多数vision layers通过对输入的一些区域应用一个特殊的操作，产生相应的输出。对比来看，其他layers（少数例外）忽略输入的空间结构，将输入视为一个大的向量，向量维度为chw。

Convolution layer：

layer类型：Convolution
CPU实现：./src/caffe/layers/conv_layer.cpp
CUDA GPU实现：./src/caffe/layers/conv_layer.cu
参数(ConvolutionParameter convolution_param)
- 必须要求的
  - num_output(c_o): 滤波器数量
  - kernel_size (or kernel_h and kernel_w): 每个滤波器的高和宽
- 强烈推荐的
  - weight_filter [default type: 'constant' value: 0]
- 可选的
  - bias_term [default true]: 是否学习和应用一组biase到滤波器输出
  - pad (or pad_h and pad_w) [default 0]: 指定在输入图像的每个边隐含添加的像素数目
  - stride (or stride_h and stride_w) [default 1]: 指定应用滤波器到图像时滤波器的间隔
  - group (g) [default 1]: 如果 g > 1，我们限制每个滤波器连接到输入的子集。特别地，输入和输出通道被分为g组，第i组输出仅仅连接到第i组输入。
输入： n * c_i * h_i * w_i
输出：n * c_o * h_o * w_o，其中h_o = (h_i + 2 * pad_h - kernel_h) / stride_h + 1，w_o可得类似结果。
例子：

layer {

  name: "conv1"

  type: "Convolution"

  bottom: "data"

  top: "conv1"

  # learning rate and decay multipliers for the filters

  param { lr_mult: 1 decay_mult: 1 }

  # learning rate and decay multipliers for the biases

  param { lr_mult: 2 decay_mult: 0 }

  convolution_param {

    num_output: 96     # learn 96 filters

    kernel_size: 11    # each filter is 11x11

    stride: 4          # step 4 pixels between each filter application

    weight_filler {

      type: "gaussian" # initialize the filters from a Gaussian

      std: 0.01        # distribution with stdev 0.01 (default mean: 0)

    }

    bias_filler {

      type: "constant" # initialize the biases to zero (0)

      value: 0

    }

  }

}

Convolution layer卷积输入图像和一组可学习的滤波器，每个滤波器对应地产生输出图像的一个feature map。

Pooling layers：

layer类型：Pooling
CPU实现：./src/caffe/layers/pooling_layer.cpp
CUDA GPU实现：./src/caffe/layers/pooling_layer.cu
参数(PoolingParameter pooling_param)
- 必须要求的
  - kernel_size (or kernel_h and kernel_w): 每个滤波器的高和宽
- 强烈推荐的
  - weight_filter [default type: 'constant' value: 0]
- 可选的
  - pool [default MAX]: pooling的方法，包括MAX, AVE, or STOCHASTIC
  - pad (or pad_h and pad_w) [default 0]: 指定在输入图像的每个边隐含添加的像素数目
  - stride (or stride_h and stride_w) [default 1]: 指定应用滤波器到图像时滤波器的间隔
输入： n * c * h_i * w_i
输出：n * c * h_o * w_o，其中h_o = (h_i + 2 * pad_h - kernel_h) / stride_h + 1，w_o可得类似结果。

例子：

layer {

  name: "pool1"

  type: "Pooling"

  bottom: "conv1"

  top: "pool1"

  pooling_param {

    pool: MAX

    kernel_size: 3 # pool over a 3x3 region

    stride: 2      # step two pixels (in the bottom blob) between pooling regions

  }

}

Local Response Normalization (LRN): 局部响应归一化

layer类型：LRN
CPU实现：./src/caffe/layers/lrn_layer.cpp
CUDA GPU实现：./src/caffe/layers/lrn_layer.cu
参数(LRNParameter lrn_param)

局部响应归一化层是一种侧抑制(lateral inhibition)，在局部输入区域上进行归一化。在ACROSS_CHANNELS模式下，局部区域扩展到相邻通道，但是没有空间扩展（也就是形状是local_size * 1 * 1）。在WITHIN_CHANNEL模式下，局部区域空间扩展，但是在各自的通道内（形状是1 * local_size * local_size）。每个输入值除以，n是每个局部区域的尺寸，求和是在以当前位置为中心的区域上操作。

im2col：它是一个做图像到列向量变换的工具，我们不需要了解。它在caffe原始的卷积中使用，通过把所有patches放入一个矩阵进行矩阵乘法。

Loss Layers：损失层

loss驱动了学习过程，它比较输出和目标之间的差异，并为之设置代价去最小化。loss本身被前向传播计算，关于loss的梯度被后向传播计算。

Softmax：type: SoftmaxWithLoss

softmax损失层计算输入的softmax的多项式logistic loss。它概念上等同于一个softmax layer，后面连接一个多项式logistic loss layer，但是softmax loss layer提供一个数值更稳定的梯度。

Sum-of-Squares / Euclidean: type: EuclideanLoss

Euclidean损失层计算两个输入的差的平方和：

Hinge / Margin: 铰链损失或边缘损失

layer类型：HingeLoss
CPU实现：./src/caffe/layers/hinge_loss_layer.cpp
CUDA GPU实现：目前尚无GPU实现
参数(HingeLossParameter hinge_loss_param)
- 可选的
  - norm [default L1]: 使用范数，目前包括 L1, L2两种选择。
输入：
- n * c * h * w Predictions
- n * 1 * 1 * 1 Labels
输出：1 * 1 * 1 * 1 所得损失

例子：

# L1 Norm

layer {

  name: "loss"

  type: "HingeLoss"

  bottom: "pred"

  bottom: "label"

}

# L2 Norm

layer {

  name: "loss"

  type: "HingeLoss"

  bottom: "pred"

  bottom: "label"

  top: "loss"

  hinge_loss_param {

    norm: L2

  }

}

Sigmoid Cross-Entropy: type: SigmoidCrossEntropyLoss交叉熵损失，用于多标签分类

Infogain: type: InfogainLoss信息增益损失

Accuracy and Top-K: 准确性对输出进行评分，计算输出与目标之间的差异，它实际上不是一个loss，没有后向传播阶段。

Activiation / Neuron Layers：激励或神经元层

通常下，这类layer都是element-wise操作，输入一个bottom blob，产生一个同样大小的blob。在下面的layer介绍中，我们忽略了输入输出大小，因为它们是相同的，都是n * c * h * w。

ReLU / Rectified-Linear and Leaky-ReLU:

layer类型：ReLU
CPU实现：./src/caffe/layers/relu_layer.cpp
CUDA GPU实现：./src/caffe/layers/relu_layer.cu
参数(ReLUParameter relu_param)
- 可选的
  - negative_slope [default 0]: 指定是否使用斜坡值代替负数部分，还是将负数部分直接设置为0.
例子：
```
layer {

  name: "relu1"

  type: "ReLU"

  bottom: "conv1"

  top: "conv1"

}
```
给定一个输入值x，ReLU层在x > 0时输出x， x < 0时输出negative_slope * x。当negative_slope参数没有设置时，等价于标准ReLU函数(max(x, 0))。它支持原位运算，意味着bottom和top blob是同址的，减少了内存消耗。

Sigmoid:

layer类型：Sigmoid
CPU实现：./src/caffe/layers/sigmoid_layer.cpp
CUDA GPU实现：./src/caffe/layers/sigmoid_layer.cu

例子：

layer {

  name: "encode1neuron"

  bottom: "encode1"

  top: "encode1neuron"

  type: "Sigmoid"

}

TanH / Hyperbolic Tangent

layer类型：TanH
CPU实现：./src/caffe/layers/tanh_layer.cpp
CUDA GPU实现：./src/caffe/layers/tanh_layer.cu

例子：

layer {

  name: "layer"

  bottom: "in"

  top: "out"

  type: "TanH"

}

Absolute Value

layer类型：AbsVal
CPU实现：./src/caffe/layers/absval_layer.cpp
CUDA GPU实现：./src/caffe/layers/absval_layer.cu

例子：

layer {

  name: "layer"

  bottom: "in"

  top: "out"

  type: "AbsVal"

}

Power

layer类型：Power
CPU实现：./src/caffe/layers/power_layer.cpp
CUDA GPU实现：./src/caffe/layers/power_layer.cu
参数(PowerParameter power_param)
- 可选的
  - power [default 1]
  - scale [default 1]
  - shift [default 0]

例子：

layer {

  name: "layer"

  bottom: "in"

  top: "out"

  type: "Power"

  power_param {

    power:

    scale:

    shift:

  }

}

power层计算输入为x时的，输出为(shift + scale * x)^power。

BNLL (Binomial Normal Log Likelihood) 二项式标准对数似然

layer类型：BNLL
CPU实现：./src/caffe/layers/bnll_layer.cpp
CUDA GPU实现：./src/caffe/layers/bnll_layer.cu

例子：

layer {

  name: "layer"

  bottom: "in"

  top: "out"

  type: BNLL

}

BNLL layer计算输入x的输出为log(1 + exp(x))。

Data Layers：数据层

数据进入caffe需要经过数据层，数据层位于网络的底部。数据可以来源于有效的数据库(LevelDB或LMDB)，直接来源于内存，或者从磁盘文件以HDF5或通用图像格式。

通常输入预处理（减均值，尺度化，随机裁剪，镜像）可以通过TransformationParameters指定。

Database：来源于LevelDB或LMDB的数据

layer类型：Data
参数：
- 必需的
  - source: 包含数据文件的目录名
  - batch_size: 每次处理的输入数目
- 可选的
  - rand_skip: 开始时跳过的输入数目，对异步SGD
  - backend [default LEVELDB]: 选择是否使用 LEVELDB或 LMDB

In-Memory：来源于内存的数据

layer类型：MemoryData
参数：
- 必需的
  - batch_size, channels, height, width: 指定从内存中读取的输入块大小

内存数据层直接从内存中读取数据，不拷贝。为了使用，需要调用MemoryDataLayer::Reset (from C++) 或Net.set_input_arrays (from Python) 指定连续数据的源，例如4D行主序数组，一次读取一个batch-size的数据块。

HDF5 Input：来源于HDF5输入

layer类型：HDF5Data
参数：
- 必需的
  - source: 读取数据的文件名
  - batch_size

HDF5 Output：HDF5输出

layer类型：HDF5Output
参数：
- 必需的
  - filename: 写数据的文件名

Images：图像输入

layer类型：ImageData
参数：
- 必需的
  - source: 一个文本文件的名字，文件中每行给出一个图片名和label
  - batch_size: 每个batch处理的图像数量
- 可选的
  - rand_skip
  - shuffle [default false]：打乱顺序与否
  - new_height, new_width: 如果给出定义，将所有图像resize到这个尺寸

Windows type: `WindowData`

Dummy

DummyData 用来开发和debug，详见 DummyDataParameter.

Common Layers：一般层

Inner Product

layer类型：InnerProduct
CPU实现：./src/caffe/layers/inner_product_layer.cpp
CUDA GPU实现：./src/caffe/layers/inner_product_layer.cu
参数(InnerProductParameter inner_product_param)
- 必需的
  - num_output (c_o): 滤波器数目
- 强烈推荐的
  - weight_filler [default type: 'constant' value: 0]
- 可选的
  - bias_filler [default type: 'constant' value: 0]
  - bias_term [default true]: 指定是否对滤波器输出学习和应用一组附加偏差项
输入：n * c_i * h_i * w_i
输出：n * c_o * 1 * 1

例子

layer {

  name: "fc8"

  type: "InnerProduct"

  # learning rate and decay multipliers for the weights

  param { lr_mult:  decay_mult:  }

  # learning rate and decay multipliers for the biases

  param { lr_mult:  decay_mult:  }

  inner_product_param {

    num_output:

    weight_filler {

      type: "gaussian"

      std: 0.01

    }

    bias_filler {

      type: "constant"

      value:

    }

  }

  bottom: "fc7"

  top: "fc8"

}

内积层（实际上通常指全连接层）将输入看成简单向量，产生一个单个向量形式的输出（blob的高和宽设置为1）。

Splitting：分割

分割层是一个功能层，将输入blob分成多个输出blob。这个layer用于一个blob被输入到多个输出层的情况。

Flattening：压扁

flatten layer也是一个功能层，将形为n * c * h * w的blob输入压扁成一个形为n * (c * h * w)的简单向量，实际上是单独压缩，每个数据是一个简单向量，维度c * h * w，共n个向量。

Reshape：整形

layer类型：Reshape
CPU实现：./src/caffe/layers/reshape_layer.cpp
参数(ReshapeParameter reshape_param)
- 可选的
  - shape
输入：一个任意维度的blob
输出：同一个blob，维度修改为reshape_param
例子：
```
  layer {

    name: "reshape"

    type: "Reshape"

    bottom: "input"

    top: "output"

    reshape_param {

      shape {

        dim:   # copy the dimension from below

        dim:

        dim:

        dim: - # infer it from the other dimensions

      }

    }

  }
```
reshape layer用于改变输入维度，但是不改变数据。就像flatten layer一样，仅仅数据维度改变，过程中没有数据被拷贝。

输出维度被Reshape_param指定。帧数直接使用，设置相应的输出blob的维度。在目标维度值设置时，两个特殊值被接受：
- 0：从bottom layer拷贝相应维度。如果给定dim: 0，且bottom由2作为第一维维度，那么top layer也由2作为第一维维度 ==> 不改变原始维度
- -1：代表从其他维度推断这一维维度。这个行为与numpy的-1和Matlab reshape时的[ ]作用是相似的。维度被计算，使得总体输出维度与bottom layer相似。在reshape操作中至多可以设置一个-1。

另外一个例子，指定reshape_param{shape{dim: 0 dim:-1}}作用与Flatten layer作用相同，都是将输入blob压扁成向量。

Concatenation：拼接

concat layer是一个功能层，用于将多个输入blob拼接城一个单个的输出blob。

layer类型：Concat
CPU实现：./src/caffe/layers/concat_layer.cpp
CUDA GPU实现：./src/caffe/layers/concat_layer.cu
参数(ConcatParameter concat_param)
- 可选的
  - axis [default 1]: 0表示沿着num连接，1表示按通道连接。
输入：n_i * c_i * h * w，K个输入blob
输出：
- 如果axis = 0: (n_1 + n_2 + ... + n_K) * c_1 * h * w，所有输入的c_i应该相同；
- 如果axis = 1: n_1 * (c_1 + c_2 + ... + c_K) * h * w，所有输入的n_i应该相同。

例子：

layer {

  name: "concat"

  bottom: "in1"

  bottom: "in2"

  top: "out"

  type: "Concat"

  concat_param {

    axis:

  }

}

Slicing：切片

slice layer也是一个功能层，将一个输入层沿着给定维度（当前仅提供基于num和通道的实现）切片成多个输出层。

例子：

layer {

  name: "slicer_label"

  type: "Slice"

  bottom: "label"

  ## Example of label with a shape N x  x  x

  top: "label1"

  top: "label2"

  top: "label3"

  slice_param {

    axis:

    slice_point:

    slice_point:

  }

}

axis表示目标axis，沿着给定维度切片。slice_point表示选择维度的索引，索引数目应该等于顶层blob数目减一。

Elementwise Operations

Eltwise

Argmax

ArgMax

Softmax

Softmax

Mean-Variance Normalization

MVN