faster-rcnn代码阅读2

二、训练

接下来回到train.py第160行，通过调用sw.train_model方法进行训练：

     def train_model(self, max_iters):

         """Network training loop."""

         last_snapshot_iter = -1

         timer = Timer()

         model_paths = []

         while self.solver.iter < max_iters:

             # Make one SGD update

             timer.tic()

             self.solver.step(1)

             timer.toc()

             if self.solver.iter % (10 * self.solver_param.display) == 0:

                 print 'speed: {:.3f}s / iter'.format(timer.average_time)

             if self.solver.iter % cfg.TRAIN.SNAPSHOT_ITERS == 0:

                 last_snapshot_iter = self.solver.iter

                 model_paths.append(self.snapshot())

         if last_snapshot_iter != self.solver.iter:

             model_paths.append(self.snapshot())

         return model_paths

方法中的self.solver.step(1)即是网络进行一次前向传播和反向传播。前向传播时，数据流会从第一层流动到最后一层，最后计算出loss，然后loss相对于各层输入的梯度会从最后一层计算回第一层。下面逐层来介绍faster-rcnn算法的运行过程。

2.1、input-data layer

第一层是由python代码构成的，其prototxt描述为：

layer {

  name: 'input-data'

  type: 'Python'

  top: 'data'

  top: 'im_info'

  top: 'gt_boxes'

  python_param {

    module: 'roi_data_layer.layer'

    layer: 'RoIDataLayer'

    param_str: "'num_classes': 2"

  }

}

从中可以看出，input-data层有三个输出：data、im_info、gt_boxes，其实现为RoIDataLayer类。这一层对数据的预处理操作为：对图片进行长宽等比例缩放，使短边缩放至600；如果缩放后，长边的长度大于1000，则以长边为基准，将长边缩放至1000，短边作相应的等比例缩放。这一层的3个输出分别为：

1、data：1, 3, h, w(一个batch只支持输入一张图)

2、im_info: im_info[0], im_info[1], im_info[2]分别为h, w, target_size/im_origin_size(缩放比例)

3、gt_boxes: (x1, y1, x2, y2, cls)

预处理部分涉及到的函数有_get_next_minibatch，get_minibatch，_get_image_blob，prep_im_for_blob，im_list_to_blob。

网络在构造过程中（即self.solver = caffe.SGDSolver(solver_prototxt)）会调用该类的setup方法：

 __C.TRAIN.IMS_PER_BATCH = 1

 __C.TRAIN.SCALES = [600]

 __C.TRAIN.MAX_SIZE = 1000

 __C.TRAIN.HAS_RPN = True

 __C.TRAIN.BBOX_REG = True

     def setup(self, bottom, top):

         """Setup the RoIDataLayer."""

         # parse the layer parameter string, which must be valid YAML

         layer_params = yaml.load(self.param_str_)

         self._num_classes = layer_params['num_classes']

         self._name_to_top_map = {}

         # data blob: holds a batch of N images, each with 3 channels

         idx = 0

         top[idx].reshape(cfg.TRAIN.IMS_PER_BATCH, 3,

             max(cfg.TRAIN.SCALES), cfg.TRAIN.MAX_SIZE)

         self._name_to_top_map['data'] = idx

         idx += 1

         if cfg.TRAIN.HAS_RPN:

             top[idx].reshape(1, 3)

             self._name_to_top_map['im_info'] = idx

             idx += 1

             top[idx].reshape(1, 4)

             self._name_to_top_map['gt_boxes'] = idx

             idx += 1

         else: # not using RPN

             # rois blob: holds R regions of interest, each is a 5-tuple

             # (n, x1, y1, x2, y2) specifying an image batch index n and a

             # rectangle (x1, y1, x2, y2)

             top[idx].reshape(1, 5)

             self._name_to_top_map['rois'] = idx

             idx += 1

             # labels blob: R categorical labels in [0, ..., K] for K foreground

             # classes plus background

             top[idx].reshape(1)

             self._name_to_top_map['labels'] = idx

             idx += 1

             if cfg.TRAIN.BBOX_REG:

                 # bbox_targets blob: R bounding-box regression targets with 4

                 # targets per class

                 top[idx].reshape(1, self._num_classes * 4)

                 self._name_to_top_map['bbox_targets'] = idx

                 idx += 1

                 # bbox_inside_weights blob: At most 4 targets per roi are active;

                 # thisbinary vector sepcifies the subset of active targets

                 top[idx].reshape(1, self._num_classes * 4)

                 self._name_to_top_map['bbox_inside_weights'] = idx

                 idx += 1

                 top[idx].reshape(1, self._num_classes * 4)

                 self._name_to_top_map['bbox_outside_weights'] = idx

                 idx += 1

         print 'RoiDataLayer: name_to_top:', self._name_to_top_map

         assert len(top) == len(self._name_to_top_map)

主要是对输出的shape进行定义。要说明的是，在前向传播的过程中，仍然会对输出的各top的shape进行重定义，并且二者定义的shape往往都是不同的。

faster-rcnn代码阅读2的更多相关文章

Faster R-CNN代码例子
主要参考文章:1,从编程实现角度学习Faster R-CNN(附极简实现) 经常是做到一半发现收敛情况不理想,然后又回去看看这篇文章的细节. 另外两篇: 2,Faster R-CNN学习总结 ...
Faster RCNN代码理解（Python）
转自http://www.infocool.net/kb/Python/201611/209696.html#原文地址第一步,准备从train_faster_rcnn_alt_opt.py入: 初 ...
Faster rcnn代码理解（4）
上一篇我们说完了AnchorTargetLayer层,然后我将Faster rcnn中的其他层看了,这里把ROIPoolingLayer层说一下: 我先说一下它的实现原理:RPN生成的roi区域大小是 ...
Faster rcnn代码理解（2）
接着上篇的博客,咱们继续看一下Faster RCNN的代码- 上次大致讲完了Faster rcnn在训练时是如何获取imdb和roidb文件的,主要都在train_rpn()的get_roidb()函 ...
Faster rcnn代码理解（1）
这段时间看了不少论文,回头看看,感觉还是有必要将Faster rcnn的源码理解一下,毕竟后来很多方法都和它有相近之处,同时理解该框架也有助于以后自己修改和编写自己的框架.好的开始吧- 这里我们跟着F ...
Faster R-CNN论文阅读摘要
论文链接: https://arxiv.org/pdf/1506.01497.pdf 代码下载: https://github.com/ShaoqingRen/faster_rcnn (MATLAB) ...
Faster rcnn代码理解（3）
紧接着之前的博客,我们继续来看faster rcnn中的AnchorTargetLayer层: 该层定义在lib>rpn>中,见该层定义: 首先说一下这一层的目的是输出在特征图上所有点的a ...
Faster RCNN代码解析
1.faster_rcnn_end2end训练 1.1训练入口及配置 def train(): cfg.GPU_ID = 0 cfg_file = "../experiments/cfgs/ ...
tensorflow faster rcnn 代码分析一 demo.py
os.environ["CUDA_VISIBLE_DEVICES"]=2 # 设置使用的GPU tfconfig=tf.ConfigProto(allow_soft_placeme ...
对faster rcnn代码讲解的很好的一个
http://www.cnblogs.com/houkai/p/6824455.html http://blog.csdn.net/u014696921/article/details/6032142 ...

随机推荐

[Swift通天遁地]四、网络和线程-(2)通过BlockOperation实现线程的队列
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★➤微信公众号:山青咏芝(shanqingyongzhi)➤博客园地址:山青咏芝(https://www.cnblogs. ...
Java系列学习(十一)-内部类
1.内部类 (1)把类定义在另一个类的内部,该类就称为内部类 (2)内部类的访问规则 A:内部类可以直接访问外部类的成员,包括私有 B:外部类要想访问内部类的成员,必须创建对象 (3)内部类的分类 A ...
[ USACO 2007 OPEN ] Dining
\(\\\) \(Description\) 有\(N\)头牛,\(F\)种食物,\(D\)种饮料,每种食物和饮料只有一份. 现在已知每头牛可以吃哪些食物,可以喝哪些饮料,问最多可以让多少头牛可以同时 ...
[ HAOI 2010 ] 最长公共子序列
\(\\\) \(Description\) 求两个长度\(\le5000\)的大写字母串的\(LCS\)长度及个数,定义两\(LCS\)中某一字符在两序列出现位置有一处不同就视为不同. \(\\\) ...
PD（Power Delivery）充电协议
关于PD的历史进程,可以在我转载的另一篇文章中了解 http://www.cnblogs.com/Hello-words/p/7851627.html PD 1.0 用的是 BFSK在 VBUS上进行 ...
TensorFlow-Gpu环境搭建——Win10+ Python+Anaconda+cuda
参考:http://blog.csdn.net/sb19931201/article/details/53648615 https://segmentfault.com/a/1190000009803 ...
C# windform自定义控件的属性小知识
word中的加粗变斜之类的一直让我以为是button,直到我接触了自定义控件,才发现实现这种机能最好的是CheckBox,然后我们在做一个系统的时候,这种控件有可能要用好多次,总不能在用一次的时候,就 ...
html——meta标签、link标签
<meta> 元素可提供有关页面的元信息(meta-information),比如针对搜索引擎和更新频度的描述和关键词. <meta> 标签位于文档的头部,不包含任何内容.&l ...
Git 学习笔记(W,I,P)
/*********************** 个人知识水平有限有任何错误请尽情指出!!谢谢啦我的Github 求粉ミﾟДﾟ彡 ***********************/ 参考教程:廖 ...
关于java 关键字enum不识别的解决办法
从别人那儿拷贝过来的myeclipse java工程,打开一看标红了一大片,仔细一看,原来是不识别enum关键字,这就有点尴尬了. 我自己重新建了一个java工程,测试了下,假如我在新建工程的时候选择 ...

faster-rcnn代码阅读2

faster-rcnn代码阅读2的更多相关文章

随机推荐

热门专题