caffe中的sgd,与激活函数(activation function)

caffe中activation function的形式，直接决定了其训练速度以及SGD的求解。

在caffe中，不同的activation function对应的sgd的方式是不同的，因此，在配置文件中指定activation layer的type，目前caffe中用的最多的是relu的activation function.

caffe中，目前实现的activation function有以下几种：

absval, bnll, power, relu, sigmoid, tanh等几种，分别有单独的layer层。其数学公式分别为:

算了，这部分我不解释了，直接看caffe的tutorial吧

ReLU / Rectified-Linear and Leaky-ReLU

LayerType: RELU
CPU implementation: ./src/caffe/layers/relu_layer.cpp
CUDA GPU implementation: ./src/caffe/layers/relu_layer.cu
Parameters (ReLUParameter relu_param)
- Optional
  - negative_slope [default 0]: specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0.

Sample (as seen in ./examples/imagenet/imagenet_train_val.prototxt)

layers {

  name: "relu1"

  type: RELU

  bottom: "conv1"

  top: "conv1"

}

Given an input value x, The RELU layer computes the output as x if x > 0 and negative_slope * x if x <= 0. When the negative slope parameter is not set, it is equivalent to the standard ReLU function of taking max(x, 0). It also supports in-place computation, meaning that the bottom and the top blob could be the same to preserve memory consumption.

Sigmoid

LayerType: SIGMOID
CPU implementation: ./src/caffe/layers/sigmoid_layer.cpp
CUDA GPU implementation: ./src/caffe/layers/sigmoid_layer.cu

Sample (as seen in ./examples/imagenet/mnist_autoencoder.prototxt)

layers {

  name: "encode1neuron"

  bottom: "encode1"

  top: "encode1neuron"

  type: SIGMOID

}

The SIGMOID layer computes the output as sigmoid(x) for each input element x.

TanH / Hyperbolic Tangent

LayerType: TANH
CPU implementation: ./src/caffe/layers/tanh_layer.cpp
CUDA GPU implementation: ./src/caffe/layers/tanh_layer.cu

Sample

layers {

  name: "layer"

  bottom: "in"

  top: "out"

  type: TANH

}

The TANH layer computes the output as tanh(x) for each input element x.

Absolute Value

LayerType: ABSVAL
CPU implementation: ./src/caffe/layers/absval_layer.cpp
CUDA GPU implementation: ./src/caffe/layers/absval_layer.cu

Sample

layers {

  name: "layer"

  bottom: "in"

  top: "out"

  type: ABSVAL

}

The ABSVAL layer computes the output as abs(x) for each input element x.

Power

LayerType: POWER
CPU implementation: ./src/caffe/layers/power_layer.cpp
CUDA GPU implementation: ./src/caffe/layers/power_layer.cu
Parameters (PowerParameter power_param)
- Optional
  - power [default 1]
  - scale [default 1]
  - shift [default 0]

Sample

layers {

  name: "layer"

  bottom: "in"

  top: "out"

  type: POWER

  power_param {

    power: 1

    scale: 1

    shift: 0

  }

}

The POWER layer computes the output as (shift + scale * x) ^ power for each input element x.

BNLL

LayerType: BNLL
CPU implementation: ./src/caffe/layers/bnll_layer.cpp
CUDA GPU implementation: ./src/caffe/layers/bnll_layer.cu

Sample

layers {

  name: "layer"

  bottom: "in"

  top: "out"

  type: BNLL

}

The BNLL (binomial normal log likelihood) layer computes the output as log(1 + exp(x)) for each input element x.