之前在测试NN中各个层的时间的时候,遇到一个非常奇怪的问题,分别使用Caffe自己的gpu方法和cuDNN方法,在卷积上性能差异非常大,但是在pooling层上基本没有变化。抽空检查了代码之后,发现是layer_factory模式导致的问题。下面就以下几个方面来进行

1.工厂模式

2.layer_factory详解

3.layer_factory中坑

4.问题影响分析

1.工厂模式

工厂模式是设计模式中的一种,面向的业务大概是在编码时不能预见需要创建那种类的实例,系统不依赖产品类如何被创建、组合和表达的细节,工厂模式的弊端是扩展比较少的项目中比较合适。

工厂模式有三种角色:

工厂类角色:根据逻辑产生具体的产品

抽象产品角色:具体产品的父类,一把由Java中的接口或者C++中的抽象类来实现

具体产品角色:产品实例

2.layer_factory详解

众所周知,Caffe1.0版本中,目前有三大类算子:CPU版本、Caffe自己实现的CUDA版本的和CuDNN版本的。layer_factory文件负责组装Caffe中算子,工厂模式的意思就是根据用户的设置,在执行时,选择相应版本的算子进行。

以下参考至http://zhuanlan.zhihu.com/hacker-and-painter/20456649

layer_factory.hpp是layer_factory的头文件

/**
* @brief A layer factory that allows one to register layers.
* During runtime, registered layers could be called by passing a LayerParameter
* protobuffer to the CreateLayer function:
*
* LayerRegistry<Dtype>::CreateLayer(param);
*
* There are two ways to register a layer. Assuming that we have a layer like:
*
* template <typename Dtype>
* class MyAwesomeLayer : public Layer<Dtype> {
* // your implementations
* };
*
* and its type is its C++ class name, but without the "Layer" at the end
* ("MyAwesomeLayer" -> "MyAwesome").
*
* If the layer is going to be created simply by its constructor, in your c++
* file, add the following line:
*
* REGISTER_LAYER_CLASS(MyAwesome);
*
* Or, if the layer is going to be created by another creator function, in the
* format of:
*
* template <typename Dtype>
* Layer<Dtype*> GetMyAwesomeLayer(const LayerParameter& param) {
* // your implementation
* }
*
* (for example, when your layer has multiple backends, see GetConvolutionLayer
* for a use case), then you can register the creator function instead, like
*
* REGISTER_LAYER_CREATOR(MyAwesome, GetMyAwesomeLayer)
*
* Note that each layer type should only be registered once.
*/ #ifndef CAFFE_LAYER_FACTORY_H_
#define CAFFE_LAYER_FACTORY_H_ #include <map>
#include <string> #include "caffe/common.hpp"
#include "caffe/proto/caffe.pb.h" namespace caffe { template <typename Dtype>
class Layer;
//LayerResistry的功能很简单,就是将类和对应的字符串类型放入到一个map当中去,以便灵活调用。主要就是注册类的功能
template <typename Dtype>
class LayerRegistry {
public:
// 函数指针Creator,返回的是Layer<Dtype>类型的指针
typedef shared_ptr<Layer<Dtype> > (*Creator)(const LayerParameter&);
// CreatorRegistry是字符串与对应的Creator的映射
typedef std::map<string, Creator> CreatorRegistry; static CreatorRegistry& Registry() {
static CreatorRegistry* g_registry_ = new CreatorRegistry();
return *g_registry_;
} // Adds a creator.
// 根据类型和函数指针,加入到表中
static void AddCreator(const string& type, Creator creator) {
CreatorRegistry& registry = Registry();
CHECK_EQ(registry.count(type), )
<< "Layer type " << type << " already registered.";
registry[type] = creator;
} // Get a layer using a LayerParameter.
//给定层的类型,创建层
static shared_ptr<Layer<Dtype> > CreateLayer(const LayerParameter& param) {
LOG(INFO) << "Creating layer " << param.name();
// 从参数中获得类型字符串
const string& type = param.type();
// 检查是否查找到给定type的Creator
CreatorRegistry& registry = Registry();
CHECK_EQ(registry.count(type), ) << "Unknown layer type: " << type
<< " (known types: " << LayerTypeList() << ")";
// 调用对应的层的Creator函数
return registry[type](param);
} private:
// Layer registry should never be instantiated - everything is done with its
// static variables.
// 禁止实例化,因为该类都是静态函数,所以是私有的
LayerRegistry() {}
//返回层的类型列表
static string LayerTypeList() {
// 获得注册表
CreatorRegistry& registry = Registry();
string layer_types;
// 遍历注册表压入layer_types字符串容器
for (typename CreatorRegistry::iterator iter = registry.begin();
iter != registry.end(); ++iter) {
if (iter != registry.begin()) {
layer_types += ", ";
}
layer_types += iter->first;
}
return layer_types;
}
}; // LayerRegisterer
// 自己定义层的注册器
// 以供后面的宏进行使用
template <typename Dtype>
class LayerRegisterer {
public:
// 层的注册器的构造函数
LayerRegisterer(const string& type,
shared_ptr<Layer<Dtype> > (*creator)(const LayerParameter&)) {
// LOG(INFO) << "Registering layer type: " << type;
// 还是调用的层注册表中的加入Creator函数加入注册表
LayerRegistry<Dtype>::AddCreator(type, creator);
}
};
//为了方便作者还弄了个宏便于注册自己写的层类
// 生成g_creator_f_type(type, creator<Dtype>)的两个函数 (double和float类型)
#define REGISTER_LAYER_CREATOR(type, creator) \
static LayerRegisterer<float> g_creator_f_##type(#type, creator<float>); \
static LayerRegisterer<double> g_creator_d_##type(#type, creator<double>) \
/* 注册自己定义的类,类名为type,
假设比如type=bias,那么生成如下的代码
下面的函数直接调用你自己的类的构造函数生成一个类的实例并返回
CreatorbiasLayer(const LayerParameter& param)
下面的语句是为你自己的类定义了LayerRegisterer<float>类型的静态变量g_creator_f_biasLayer(float类型,实际上就是把你自己的类的字符串类型和类的实例绑定到注册表)
static LayerRegisterer<float> g_creator_f_biasLayer(bias, CreatorbiasLayer)
下面的语句为你自己的类定义了LayerRegisterer<double>类型的静态变量g_creator_d_biasLayer(double类型,实际上就是把你自己的类的字符串类型和类的实例绑定到注册表)
static LayerRegisterer<double> g_creator_d_biasLayer(bias, CreatorbiasLayer)
*/
#define REGISTER_LAYER_CLASS(type) \
template <typename Dtype> \
shared_ptr<Layer<Dtype> > Creator_##type##Layer(const LayerParameter& param) \
{ \
return shared_ptr<Layer<Dtype> >(new type##Layer<Dtype>(param)); \
} \
REGISTER_LAYER_CREATOR(type, Creator_##type##Layer) } // namespace caffe #endif // CAFFE_LAYER_FACTORY_H_

经过上边的阐述之后,实现部分(这部分和1.0版本有出入,大的方面不影响)

layer_factory.hpp:

 // Make sure we include Python.h before any system header
// to avoid _POSIX_C_SOURCE redefinition
#ifdef WITH_PYTHON_LAYER
#include <boost/python.hpp>
#endif
#include <string> #include "caffe/layer.hpp"
#include "caffe/layer_factory.hpp"
#include "caffe/proto/caffe.pb.h"
#include "caffe/vision_layers.hpp" #ifdef WITH_PYTHON_LAYER
#include "caffe/python_layer.hpp"
#endif namespace caffe { // 写一个获取卷积层实例的函数
// Get convolution layer according to engine.
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetConvolutionLayer(
const LayerParameter& param) {
// 从参数中获取是使用什么引擎进行计算CUDNN还是CAFFE还是DEFAULT
// engine可从caffe.proto中看出是枚举类型的
ConvolutionParameter_Engine engine = param.convolution_param().engine();
if (engine == ConvolutionParameter_Engine_DEFAULT) {
engine = ConvolutionParameter_Engine_CAFFE;
#ifdef USE_CUDNN
engine = ConvolutionParameter_Engine_CUDNN;
#endif
}
if (engine == ConvolutionParameter_Engine_CAFFE) {
// 直接初始化Caffe的卷积层
return shared_ptr<Layer<Dtype> >(new ConvolutionLayer<Dtype>(param));
#ifdef USE_CUDNN
} else if (engine == ConvolutionParameter_Engine_CUDNN) {
// 初始化CUDNN的卷积层
return shared_ptr<Layer<Dtype> >(new CuDNNConvolutionLayer<Dtype>(param));
#endif
} else {// 否则就是出错了
LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
}
}
// 注册该卷积层,类型名为Convolution,获取卷积层的实例为GetConvolutionLayer函数
REGISTER_LAYER_CREATOR(Convolution, GetConvolutionLayer); // 获取池化层的实例,同卷积层的逻辑
// Get pooling layer according to engine.
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetPoolingLayer(const LayerParameter& param) {
PoolingParameter_Engine engine = param.pooling_param().engine();
if (engine == PoolingParameter_Engine_DEFAULT) {
engine = PoolingParameter_Engine_CAFFE;
#ifdef USE_CUDNN
engine = PoolingParameter_Engine_CUDNN;
#endif
}
if (engine == PoolingParameter_Engine_CAFFE) {
return shared_ptr<Layer<Dtype> >(new PoolingLayer<Dtype>(param));
#ifdef USE_CUDNN
} else if (engine == PoolingParameter_Engine_CUDNN) {
PoolingParameter p_param = param.pooling_param();
if (p_param.pad() || p_param.pad_h() || p_param.pad_w() ||
param.top_size() > ) {
LOG(INFO) << "CUDNN does not support padding or multiple tops. "
<< "Using Caffe's own pooling layer.";
return shared_ptr<Layer<Dtype> >(new PoolingLayer<Dtype>(param));
}
return shared_ptr<Layer<Dtype> >(new CuDNNPoolingLayer<Dtype>(param));
#endif
} else {
LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
}
} // 注册池化层
REGISTER_LAYER_CREATOR(Pooling, GetPoolingLayer); // 注册ReLU层
// Get relu layer according to engine.
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetReLULayer(const LayerParameter& param) {
ReLUParameter_Engine engine = param.relu_param().engine();
if (engine == ReLUParameter_Engine_DEFAULT) {
engine = ReLUParameter_Engine_CAFFE;
#ifdef USE_CUDNN
engine = ReLUParameter_Engine_CUDNN;
#endif
}
if (engine == ReLUParameter_Engine_CAFFE) {
return shared_ptr<Layer<Dtype> >(new ReLULayer<Dtype>(param));
#ifdef USE_CUDNN
} else if (engine == ReLUParameter_Engine_CUDNN) {
return shared_ptr<Layer<Dtype> >(new CuDNNReLULayer<Dtype>(param));
#endif
} else {
LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
}
} REGISTER_LAYER_CREATOR(ReLU, GetReLULayer); // 注册sigmoid层
// Get sigmoid layer according to engine.
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetSigmoidLayer(const LayerParameter& param) {
SigmoidParameter_Engine engine = param.sigmoid_param().engine();
if (engine == SigmoidParameter_Engine_DEFAULT) {
engine = SigmoidParameter_Engine_CAFFE;
#ifdef USE_CUDNN
engine = SigmoidParameter_Engine_CUDNN;
#endif
}
if (engine == SigmoidParameter_Engine_CAFFE) {
return shared_ptr<Layer<Dtype> >(new SigmoidLayer<Dtype>(param));
#ifdef USE_CUDNN
} else if (engine == SigmoidParameter_Engine_CUDNN) {
return shared_ptr<Layer<Dtype> >(new CuDNNSigmoidLayer<Dtype>(param));
#endif
} else {
LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
}
} REGISTER_LAYER_CREATOR(Sigmoid, GetSigmoidLayer); // 注册softmax层
// Get softmax layer according to engine.
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetSoftmaxLayer(const LayerParameter& param) {
SoftmaxParameter_Engine engine = param.softmax_param().engine();
if (engine == SoftmaxParameter_Engine_DEFAULT) {
engine = SoftmaxParameter_Engine_CAFFE;
#ifdef USE_CUDNN
engine = SoftmaxParameter_Engine_CUDNN;
#endif
}
if (engine == SoftmaxParameter_Engine_CAFFE) {
return shared_ptr<Layer<Dtype> >(new SoftmaxLayer<Dtype>(param));
#ifdef USE_CUDNN
} else if (engine == SoftmaxParameter_Engine_CUDNN) {
return shared_ptr<Layer<Dtype> >(new CuDNNSoftmaxLayer<Dtype>(param));
#endif
} else {
LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
}
} REGISTER_LAYER_CREATOR(Softmax, GetSoftmaxLayer); // 注册tanh层
// Get tanh layer according to engine.
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetTanHLayer(const LayerParameter& param) {
TanHParameter_Engine engine = param.tanh_param().engine();
if (engine == TanHParameter_Engine_DEFAULT) {
engine = TanHParameter_Engine_CAFFE;
#ifdef USE_CUDNN
engine = TanHParameter_Engine_CUDNN;
#endif
}
if (engine == TanHParameter_Engine_CAFFE) {
return shared_ptr<Layer<Dtype> >(new TanHLayer<Dtype>(param));
#ifdef USE_CUDNN
} else if (engine == TanHParameter_Engine_CUDNN) {
return shared_ptr<Layer<Dtype> >(new CuDNNTanHLayer<Dtype>(param));
#endif
} else {
LOG(FATAL) << "Layer " << param.name() << " has unknown engine.";
}
} REGISTER_LAYER_CREATOR(TanH, GetTanHLayer); // 注册PYTHON层
#ifdef WITH_PYTHON_LAYER
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetPythonLayer(const LayerParameter& param) {
Py_Initialize();
try {
bp::object module = bp::import(param.python_param().module().c_str());
bp::object layer = module.attr(param.python_param().layer().c_str())(param);
return bp::extract<shared_ptr<PythonLayer<Dtype> > >(layer)();
} catch (bp::error_already_set) {
PyErr_Print();
throw;
}
} REGISTER_LAYER_CREATOR(Python, GetPythonLayer);
#endif // Layers that use their constructor as their default creator should be
// registered in their corresponding cpp files. Do not register them here.
} // namespace caffe

3.layer_factory中坑

在现有的代码中,Pooling层的注册部分出现了这个代码:

// CuDNN assumes layers are not being modified in place, thus
// breaking our index tracking for updates in some cases in Caffe.
// Until there is a workaround in Caffe (index management) or
// cuDNN, use Caffe layer to max pooling, or don't use in place
// layers after max pooling layers
if (param.pooling_param().pool() == PoolingParameter_PoolMethod_MAX) {
return shared_ptr<Layer<Dtype> >(new PoolingLayer<Dtype>(param));
} else {
return shared_ptr<Layer<Dtype> >(new CuDNNPoolingLayer<Dtype>(param));
}

这就直接导致,只要你用的是MaxPool,使用的一定是Caffe自己实现的cu代码,永远无法使用cuDNN版本的代码,这就解释了我们之前测试MaxPool层性能一直没有变化的原因

4.问题影响分析

但是caffe的作者为什么不使用cuDNN的MaxPool呢,经过查询NVIDIA cuDNN的User Manual,我们发现,

4.144. cudnnPoolingForward

cudnnStatus_t cudnnPoolingForward(
cudnnHandle_t handle,
const cudnnPoolingDescriptor_t poolingDesc,
const void *alpha,
const cudnnTensorDescriptor_t xDesc,
const void *x,
const void *beta,
const cudnnTensorDescriptor_t yDesc,
void *y)

This function computes pooling of input values (i.e., the maximum or average of several adjacent values) to produce an output with smaller height and/or width.

Note: All tensor formats are supported, best performance is expected when usingHW-packedtensors. Only 2 and 3 spatial dimensions are allowed.
Note: The dimensions of the output tensoryDesccan be smaller or bigger than the dimensions advised by the routinecudnnGetPooling2dForwardOutputDimorcudnnGetPoolingNdForwardOutputDim.

Parameters

handle

Input. Handle to a previously created cuDNN context.

poolingDesc

Input. Handle to a previously initialized pooling descriptor.

alpha, beta

Input. Pointers to scaling factors (in host memory) used to blend the computation result with prior value in the output layer as follows: dstValue = alpha[0]*result + beta[0]*priorDstValue. Refer to this section for additional details.

xDesc

Input. Handle to the previously initialized input tensor descriptor. Must be of type FLOAT, or DOUBLE, or HALF, or INT8. See cudnnDataType_t.

x

Input. Data pointer to GPU memory associated with the tensor descriptorxDesc.

yDesc

Input. Handle to the previously initialized output tensor descriptor. Must be of type FLOAT, or DOUBLE, or HALF, or INT8. See cudnnDataType_t.

y

Output. Data pointer to GPU memory associated with the output tensor descriptoryDesc.

The possible error values returned by this function and their meanings are listed below.

Returns

CUDNN_STATUS_SUCCESS

The function launched successfully.

CUDNN_STATUS_BAD_PARAM

At least one of the following conditions are met:

  • The dimensionsn,cof the input tensor and output tensors differ.
  • Thedatatypeof the input tensor and output tensors differs.
CUDNN_STATUS_NOT_SUPPORTED

The function does not support the provided configuration. See the following for some examples of non-supported configurations:

  • ThewStrideof input tensor or output tensor is not 1.
CUDNN_STATUS_EXECUTION_FAILED

The function failed to launch on the GPU

这个地方比较神奇的是只能传入两个参数,这就无法实现mask的更新,不太明白cuDNN设计者的思路,目前看,这个地方要想保持正确性,暂时应该是无法使用cuDNN的PoolingForward了。

Caffe之layer_factory的更多相关文章

  1. 基于Caffe的DeepID2实现(中)

    小喵的唠叨话:我们在上一篇博客里面,介绍了Caffe的Data层的编写.有了Data层,下一步则是如何去使用生成好的训练数据.也就是这一篇的内容. 小喵的博客:http://www.miaoerduo ...

  2. 浅析py-faster-rcnn中不同版本caffe的安装及其对应不同版本cudnn的解决方案

    浅析py-faster-rcnn中不同版本caffe的安装及其对应不同版本cudnn的解决方案 本文是截止目前为止最强攻略,按照本文方法基本可以无压力应对caffe和Ross B. Girshick的 ...

  3. 【caffe】mnist训练日志

    @tags caffe 前面根据train_lenet.sh改写了train_lenet.py后,在根目录下执行它,得到一系列输出,内容如下: I1013 10:05:16.721294 1684 c ...

  4. 在caffe中添加新的layer

    比如现在要添加一个vision layer,名字叫Ly_Layer:(一般命名第一个字母大写,其余小写.) 1.属于哪个类型的layer(共五种:common_layer, data_layer, l ...

  5. caffe: compile error: Could not open or find file your path~~/resized_data/0 and a total of 2 images .

    I0219 14:48:40.965386 31108 net.cpp:76] Memory required for data: 0I0219 14:48:40.965517 31108 layer ...

  6. [caffe]深度学习之图像分类模型VGG解读

    一.简单介绍 vgg和googlenet是2014年imagenet竞赛的双雄,这两类模型结构有一个共同特点是go deeper.跟googlenet不同的是.vgg继承了lenet以及alexnet ...

  7. caffe+GPU︱AWS.G2+Ubuntu14.04+GPU+CUDA8.0+cudnn8.0

    国服亚马逊的GPU实例G2.2xlarge的python+caffe的安装过程,被虐- 一周才装出来- BVLC/caffe的在AWS安装的官方教程github: https://github.com ...

  8. caffe项目工程化封装FRCNN

    各种坑!!想要做好,一定要自己一步步试,下载别人的总会出现各种问题. 步骤如下:(可以把这些文件打包在一个文件加下,分两个文件libs,include,一定要是自己的文件) 1 首先是配置caffe的 ...

  9. caffe中使用python定义新的层

    转载链接:http://withwsf.github.io/2016/04/14/Caffe-with-Python-Layer/ Caffe通过Boost中的Boost.Python模块来支持使用P ...

随机推荐

  1. CockroachDB学习笔记——对此的选择

    无意间了解到TiDB,然后知道了他是一款国产团队开源的NewSQL数据库, 看了一下官网,有很多中文的文档和技术分享挺不错的. 但是安装起来好像挺麻烦的说. 测试的硬件环境 也吓死我了,我只有一台笔记 ...

  2. js 匿名函数 js-函数定义方法

    1.任何函数都是有返回值的,没有返回值的,在某些语言里称之为过程例如PL/SQL 2.js中的函数如果没有return 关键字指明给出的返回值,那么当调用完函数后,会返回“undefined" ...

  3. 02.02 lamp环境搭建笔记

    lamp环境 在linux中安装 apache.mysql.php三种软件环境,同时需要安装他 某些插件. cp /etc/apt/sources.list /etc/apt/sources.list ...

  4. layer简单使用

    官方:https://www.layui.com/doc/modules/layer.html 源码:https://github.com/xiaostudy/web_sample 效果 目录结构 代 ...

  5. poj2406(求字符串的周期,kmp算法next数组的应用)

    题目链接:https://vjudge.net/problem/POJ-2406 题意:求出给定字符串的周期,和poj1961类似. 思路:直接利用next数组的定义即可,当没有周期时,周期即为1. ...

  6. 2019icpc-徐州网络赛

    B. hxc写的 AC code: #pragma GCC optimize(2) #include <cstdio> #include <queue> #include &l ...

  7. [Cometoj#4 E]公共子序列_贪心_树状数组_动态规划

    公共子序列 题目链接:https://cometoj.com/contest/39/problem/E?problem_id=1585 数据范围:略. 题解: 首先可以考虑知道了$1$的个数和$3$的 ...

  8. DDL数据库对象管理

    DDL数据库对象管理 约束的分类: 主键约束:primary key 要求主键列数据唯一,并且不允许为空. 外键约束:foreign key 用于在两表之间建立关系,需要指定引用主表的哪一列. 检查约 ...

  9. 导出远程oracle数据库到本地

    1.以管理员身份运行 Net Manager 以管理员身份运行cmd

  10. flink两种安装方式

    Flink Standalone 集群 HA 配置 1. HA 集群环境规划 使用三台节点实现两主两从集群(由于笔记本性能限制,不能开启太多虚拟机,其实使用三 台和四台机器在安装配置上没有本质区别) ...