RoIPooling

、

代码：

template <typename Dtype>

void ROIPoolingLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,

      const vector<Blob<Dtype>*>& top) {

  //输入有两部分组成，data和rois

  const Dtype* bottom_data = bottom[0]->cpu_data();

  const Dtype* bottom_rois = bottom[1]->cpu_data();

  // Number of ROIs

  int num_rois = bottom[1]->num();

  int batch_size = bottom[0]->num();

  int top_count = top[0]->count();

  Dtype* top_data = top[0]->mutable_cpu_data();

  caffe_set(top_count, Dtype(-FLT_MAX), top_data);

  int* argmax_data = max_idx_.mutable_cpu_data();

  caffe_set(top_count, -1, argmax_data);

  // For each ROI R = [batch_index x1 y1 x2 y2]: max pool over R

  for (int n = 0; n < num_rois; ++n) {

    int roi_batch_ind = bottom_rois[0];

    //把原图的坐标映射到feature map上面

    int roi_start_w = round(bottom_rois[1] * spatial_scale_);

    int roi_start_h = round(bottom_rois[2] * spatial_scale_);

    int roi_end_w = round(bottom_rois[3] * spatial_scale_);

    int roi_end_h = round(bottom_rois[4] * spatial_scale_);

    //计算每个roi在feature map上面的大小

    int roi_height = max(roi_end_h - roi_start_h + 1, 1);

    int roi_width = max(roi_end_w - roi_start_w + 1, 1);

    //pooling之后的feature map的一个值对应于pooling之前的feature map上的大小

    //注：由于roi的大小不一致，所以每次都需要计算一次

    const Dtype bin_size_h = static_cast<Dtype>(roi_height)

                             / static_cast<Dtype>(pooled_height_);

    const Dtype bin_size_w = static_cast<Dtype>(roi_width)

                             / static_cast<Dtype>(pooled_width_);

    //找到对应的roi的feature map，如果input data的batch size为1

    //那么roi_batch_ind=0

    const Dtype* batch_data = bottom_data + bottom[0]->offset(roi_batch_ind);

    //pooling的过程是针对每一个channel的，所以需要循环遍历

    for (int c = 0; c < channels_; ++c) {

      //计算output的每一个值，所以需要遍历一遍output，然后求出所有值

      for (int ph = 0; ph < pooled_height_; ++ph) {

        for (int pw = 0; pw < pooled_width_; ++pw) {

          // Compute pooling region for this output unit:

          //  start (included) = floor(ph * roi_height / pooled_height_)

          //  end (excluded) = ceil((ph + 1) * roi_height / pooled_height_)

          // 计算output上的一点对应于input上面区域的大小[hstart, wstart, hend, wend]

          int hstart = static_cast<int>(floor(static_cast<Dtype>(ph)

                                              * bin_size_h));

          int hend = static_cast<int>(ceil(static_cast<Dtype>(ph + 1)

                                           * bin_size_h));

          int wstart = static_cast<int>(floor(static_cast<Dtype>(pw)

                                              * bin_size_w));

          int wend = static_cast<int>(ceil(static_cast<Dtype>(pw + 1)

                                           * bin_size_w));

          //将映射后的区域平动到对应的位置[hstart, wstart, hend, wend]

          hstart = min(max(hstart + roi_start_h, 0), height_);

          hend = min(max(hend + roi_start_h, 0), height_);

          wstart = min(max(wstart + roi_start_w, 0), width_);

          wend = min(max(wend + roi_start_w, 0), width_);

          //如果映射后的矩形框不符合

          bool is_empty = (hend <= hstart) || (wend <= wstart);

          //pool_index指的是此时计算的output的值对应于output的位置

          const int pool_index = ph * pooled_width_ + pw;

          //如果矩形不符合，此处output的值设为0，此处的对应于输入区域的最大值为-1

          if (is_empty) {

            top_data[pool_index] = 0;

            argmax_data[pool_index] = -1;

          }

          //遍历output的值对应于input的区域块

          for (int h = hstart; h < hend; ++h) {

            for (int w = wstart; w < wend; ++w) {

             // 对应于input上的位置

              const int index = h * width_ + w;

              //计算区域块的最大值，保存在output对应的位置上

              //同时记录最大值的索引

              if (batch_data[index] > top_data[pool_index]) {

                top_data[pool_index] = batch_data[index];

                argmax_data[pool_index] = index;

              }

            }

          }

        }

      }

      // Increment all data pointers by one channel

      batch_data += bottom[0]->offset(0, 1);

      top_data += top[0]->offset(0, 1);

      argmax_data += max_idx_.offset(0, 1);

    }

    // Increment ROI data pointer

    bottom_rois += bottom[1]->offset(1);

  }

}

RoIPooling的更多相关文章

RoIPooling、RoIAlign笔记
一).RoIPooling 这个可以在Faster RCNN中使用以便使生成的候选框region proposal映射产生固定大小的feature map 先贴出一张图,接着通过这图解释RoiPool ...
RoIPooling与RoIAlign的区别
一.RoIPooling与RoIAlign 1.1.RoIPooling 通过对Faster RCNN的学习我妈了解的RolPooling可以使生成的候选框region proposal映射产生固定大 ...
ROIAlign, ROIPooling及ROIWarp对比
RoI Pooling 实现从原图ROI区域映射到卷积区域最后pooling到固定大小的功能,然后通过池化把该区域的尺寸归一化成卷积网络输入的尺寸. ROIAlign 上面RoI Pooling从原图 ...
python读取caffemodel文件
caffemodel是二进制的protobuf文件,利用protobuf的python接口可以读取它,解析出需要的内容不少算法都是用预训练模型在自己数据上微调,即加载"caffemodel ...
Faster-RCNN 训练自己的数据
在前一篇随笔中,数据制作成了VOC2007格式,可以用于Faster-RCNN的训练. 1.针对数据的修改修改datasets\VOCdevkit2007\VOCcode\VOCinit.m,我只做 ...
[OpenCV] Face Detection
即将进入涉及大量数学知识的阶段,先读下“别人家”的博文放松一下. 读罢该文,基本能了解面部识别领域的整体状况. 后生可畏. 结尾的Google Facenet中的2亿数据集,仿佛隐约听到:“你们都玩儿 ...
（转）技术揭秘：海康威视PASCAL VOC2012目标检测权威评测夺冠之道
技术揭秘:海康威视PASCAL VOC2012目标检测权威评测夺冠之道原创 2016-09-21 钟巧勇深度学习大讲堂点击上方“深度学习大讲堂”可订阅哦!深度学习大讲堂是高质量原创内容平台,邀请 ...
大规模视觉识别挑战赛ILSVRC2015各团队结果和方法 Large Scale Visual Recognition Challenge 2015
Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Legend: Yellow background = winner in thi ...
海康威视研究院ImageNet2016竞赛经验分享
原文链接:https://zhuanlan.zhihu.com/p/23249000 目录场景分类数据增强数据增强对最后的识别性能和泛化能力都有着非常重要的作用.我们使用下面这些数据增强方法. ...

随机推荐

JS中的闭包（closure）
JS中的闭包(closure) 闭包(closure)是Javascript语言的一个难点,也是它的特色,很多高级应用都要依靠闭包实现.下面就是我的学习笔记,对于Javascript初学者应该是很有用 ...
ubuntu下安装飞鸽传书
1.从官网下载Linux版本飞鸽传书(http://www.ipmsg.org.cn/) 2.解压后执行 ./QIpmsg 若报错 libstdc++.so.6: version `CXXABI_AR ...
微信小程序测试
1.连接真机,微信已经登录过了 2.代码: 3.appium自带的识别工具 4.设置工具连接设备的方式参考资料: https://www.cnblogs.com/yoyoketang/p/91449 ...
C++ shared_ptr、unique_ptr、weak_ptr
shared_ptr unique_ptr weak_ptr 内存泄漏智能指针引用计数循环引用 reset
使用Kernel NetEm和tc模拟复杂网络环境
关键词:netem(Network Emulator).tc(Traffic Control). 大部分局域网环境良好,但是产品实际网络环境可能千差万别,为了对产品进行各种情况测试就需要模拟网络环境. ...
IDEA远程调试监控端口
大家知道,线上环境定位问题不是那么简单的,如果有非常完善的日志以及监控系统是不必担心的,但是应对这些并不完善的场景下,IDEA提供了一种远程调试的功能,remote集成了可以远程调试的功能,只需要在你 ...
Facebook第三方网页登录（JavaScript SDK）
文档网址:https://developers.facebook.com/docs/facebook-login/web#logindialog 一.应用配置 https://www.faceboo ...
VSCode 必装的 10 个高效开发插件
本文介绍了目前前端开发最受欢迎的开发工具 VSCode 必装的 10 个开发插件,用于大大提高软件开发的效率. VSCode 的基本使用可以参考我的原创视频教程「VSCode 高效开发必装插件」. V ...
Open Source
资源来源于http://www.cnblogs.com/Leo_wl/category/246424.html RabbitMQ 安装与使用摘要: RabbitMQ 安装与使用前言吃多了拉就是队 ...
MySQL中的float和decimal类型有什么区别
decimal 类型可以精确地表示非常大或非常精确的小数.大至 1028(正或负)以及有效位数多达 28 位的数字可以作为 decimal类型存储而不失其精确性.该类型对于必须避免舍入错误的应用程序( ...

RoIPooling

RoIPooling的更多相关文章

随机推荐

热门专题