[Papers] Semantic Segmentation Papers(1)
目录
Tags: Paper
总结几篇看过的语义分割论文,FCN, DeconvNet, SegNet, U-Net,后面会再总结DeepLab的论文.
FCN
Abstract
提出end to end FCN,输入arbitrary size image, 输出同样大小的label map. FCN中的skip architecture combines semantic information from a deep coarse layer with appearance information from a shallow fine layer to produce accurate and detailed segmentations.
Introduction
使用supervised pretrained classification netowrk来进行pixel wise prediction.
语义分割问题面对的问题是语义信息和位置信息之间的inherent tension
Related Work
FCN
FCN作为将深度学习应用到分割问题上的开山鼻祖,虽然不是end-to-end 的,但是为后面的U-net, E-net, SegNet打下基础,特别是使用deconvolution 来对 coarse map unsample这一想法.
Adapting classifiers for dense prediction
全连接层可以看做在整个feature map上卷积的特殊情况,去除网络最后的全连接层网络输出的是label map加上spatial loss 就可以进行end-to-end dense learning.
Shift-and-stitch is filter rarefaction
rarefaction: 稀薄化
a trous algorithm
FCN还提到了后面DeepLab中用到的带孔卷积
Upsampling is backwards strided convolution
In a sense, upsampling with factor \(f\) is convolution with a fractional input stride of \(1/f\). So long as \(f\) is integral, a natural way to upsample is therefore backwards convolution (sometimes called deconvolution) with an output stride of \(f\).
这段话有点费解.
Deconvolution的解释:
- https://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers
- https://github.com/vdumoulin/conv_arithmetic
stride two and padding:
称为transposed convolution是因为,transposed convolution经常用在backward计算的时候,反向传播可以通过乘以权重矩阵的转置完成. 图中的filter明明是stride 1为什么是stride 2呢?stride 2是相对原图(没有stride之前), 因为每个像素之前插入了0,现在的stride 1 就相当于原来的stride 2.
Shown above is a transposed convolution. 'stride two' means stride in the corresponding original convolution is two. This is precisely why you have 1 (=2-1, 2 being the original stride) layer of zeros in between rows and columns. Transposed convolution is generally used in backward pass. It is called transposed because of the analogy with fully connected layer where you multiply with the transpose of the weight matrix during a backward pass.
patchwise trainig is loss sampling
Segmentation Architecture
作者fully convolution networks主要由in-network unsampling和pixelwise loss组成, 此外还有skip architecture.
Learning DeConvolution Network for Semantic Sefeijiegmentation
Abstract
deep deconvolution + proposal-wise prediction
反卷积网络由反卷积和上采样层组成
1. Introduction
现有的基于CNN semantic segmentation网络大都是对前面分类网络得到的label map(FCN中是16*16)做基于bilinear interpolation的deconvolution. 然而这种deconvolution 的输入是前面经过convolution 和pooling 的 feature map这个feature map已经失去了很多structured details, 往往使用deconvolution不能得到很好的效果。
一些方法使用FCN + Conditional Random Field来解决这一问题。
2. Related Work
FCN:
FCN由于其fixed size 的 receptive field使其对于过小的物体不能分类,对于过大的物体则会预测处多个类别(大小相对于receptive field而言).
FCN+CRF
3. System Architecture
网络的encoder是VGG分类网络,网络的decoder是对分类网络得到的feature map进行unpooling的deconvolution网络,最后网络输出的是概率图,对于每个像素属于每一个类别的概率. 最后得到每个像素类别的label. 这里可以提前说下DeconvNet没有去除VGG分类网络的fully connected layer, 而fully connected layer中有大量的参数,最后训练处理出的模型会占用大量的空间. 如果是做Application级别的产品最好还是用后面的SegNet, SegNet去除了fully connected layer不管是训练速度还是占用内存都要小很多.
Unpooling和Deconvolution
Unpooling
什么是pooling?
Pooling in convolution network is designed to filter noisy
activations in a lower layer by abstracting activations in a
receptive field with a single representative value.
虽然pooling可以增强激活区域的鲁棒性,但是同事也丢失了感受域内的空间信息。这些structure information可能对需要dense prediction的segmentation有较大的作用.
如何实现unpooling?
记录pooling时最大激活点(maximum activation)的位置。
deconvolution
从unpooling处得到的内容是稀疏的,通过deconvolution 可以得到enlarge dense 的 activation map. 然后将enlarge 边缘的像素裁剪掉得到和unpooling 输入大小一样的feature map.
在网络中unpooling和deconvolution的作用是不一样的:可以说unpooling是example specific的而deconvolution是class specific的. example specific意思就是只要是object那么unpooling通过前面pooling记录的 location information重建object的structure, 但是我们需要对每个像素点进行分类,那么你得到object stucture还不够,周围还有噪声信息和非target class的信息,那么deconvolution就是对其target class信息进行放大,对非target class信息进行抑制. 结合二者, decoder端的deconvolution network就可以输出较为准确的segmentation map.
其实从这两点而言DeconvNet和SegNet的decoder端的结构很相似的. 上采样得到sparse activation map然后通过deconv/conv得到dense activation map.
从下面activation map的可视化也可以看出encoder端是特征逐渐抽象(detail to coarse)的过程而decoder是从(coase to detail)的过程:
instance wise segmentation vs. image level segmentation
这里没怎么看懂
Training
- Batch Normalization
- Two-stage Training
ensemble with FCN
网络详细结构:
Inference
测试的时候每张图像在输入网络之前,作者使用edge-box来产生candicate proposals这样可以在不同尺度上检测物体. 每张测试图片先产生2000个candicates然后根据object score挑选50个输入网路. 前面提到的instance wise segmentation也应该和这里有关,感觉作者介绍的不是很详细.
总体而言DeConvNet的idea虽然比较novel(不知道SegNet有没有借鉴DeConvNet), 但是很明显网络过深,很难训练,而且没有去除fully connected layer, 还需要使用edge-box产生candicate proposal, 不是一个end-to-end的网络. 实际使用的话我还是推荐SegNet吧.
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
Abstract
The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample.
1. Introduction
在decoding端重复使用encoding的 max-pooling indices:
- improves boundary delineation
- reduces the number of parameters enabling end-to-end training
Architecture
without fully connected layer(134M to 14.7M)
encoder
conv + batchnorm + ReLU + max pooling(2*2)
to keep the spatial resolution of the feature map after max pooling, Segnet choose to store max pooling indices.
decoder
upsample feature maps using max pooling indices -> sparse feature maps. + trainable filter banks + batch norm
Use variant kinds of decoders to compare
Training
- median frequency balancing
- natural frequency balancing
analysis
BF: boundry F1 measure
SegNet和Deconvolution Net相似之处都是在encoder端保存max pooling indices,然后在decoder端使用indices进行unsample得到feature map, 然而这个时候得到的feature map仍然是稀疏的,因此在这个feature map之后再接convolution layer/deconvolutional得到更好的feature map. SegNet和Deconvoluton Net差别在于SegNet没有fully connected layer是一个更加轻量的框架.
U-Net
Abstract
- use data augumentation to train the model
- contracting path to capture context
- symmetric expanding path enables precise localization
Introduction
- High resolution features from the contracting path are combined with the upsampled output
- overlap-tile strategy 这里没怎么看懂啊
- elastic deformation for augmentation
- 使用weight loss解决多分类问题中的touching border问题
Network Architecture
左边是contracting path, 右边是expansive path
左边使用33 convolution + ReLU + 22 max-pooling, 每次pooling feature channels 加倍
右边使用upsampling + 22 convolution(feature channels数目减半)+concatenation with corresponding feature map from contracting path + 33 convolution + ReLU
Training
- energy function:
\[E = \sum_{x\in\Omega} w(x)log(p_{l(x)}(x))\] - weight map:
\[w(x) = w_c(x) + w_0 \cdot exp(-\frac{(d_1(x)+d_2(x))^2}{2\sigma^2})\] - 每一层的权重初始化,高斯分布,std: \(\sqrt{2/N}\)
Experiments
在两个医学数据集上都取得了较好的效果.
总体而言U-net结构是比较简单的,而且根据作者所言比较适合小数据集,第一个来自于EM segmentation challenge 中只有30张(512*512)图片,
[Papers] Semantic Segmentation Papers(1)的更多相关文章
- 3D Graph Neural Networks for RGBD Semantic Segmentation
3D Graph Neural Networks for RGBD Semantic Segmentation 原文章:https://www.yuque.com/lart/papers/wmu47a ...
- Decoders Matter for Semantic Segmentation:Data-Dependent Decoding Enables Flexible Feature Aggregation
Decoders Matter for Semantic Segmentation:Data-Dependent Decoding Enables Flexible Feature Aggregati ...
- semantic segmentation 和instance segmentation
作者:周博磊链接:https://www.zhihu.com/question/51704852/answer/127120264来源:知乎著作权归作者所有,转载请联系作者获得授权. 图1. 这张图清 ...
- Review of Semantic Segmentation with Deep Learning
In this post, I review the literature on semantic segmentation. Most research on semantic segmentati ...
- Fully Convolutional Networks for semantic Segmentation(深度学习经典论文翻译)
摘要 卷积网络在特征分层领域是非常强大的视觉模型.我们证明了经过端到端.像素到像素训练的卷积网络超过语义分割中最先进的技术.我们的核心观点是建立"全卷积"网络,输入任意尺寸,经过有 ...
- 论文笔记之:Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation
Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation xx
- 论文笔记之:Instance-aware Semantic Segmentation via Multi-task Network Cascades
Instance-aware Semantic Segmentation via Multi-task Network Cascades Jifeng Dai Kaiming He Jian Sun ...
- 目标检测--Rich feature hierarchies for accurate object detection and semantic segmentation(CVPR 2014)
Rich feature hierarchies for accurate object detection and semantic segmentation 作者: Ross Girshick J ...
- 论文学习:Fully Convolutional Networks for Semantic Segmentation
发表于2015年这篇<Fully Convolutional Networks for Semantic Segmentation>在图像语义分割领域举足轻重. 1 CNN 与 FCN 通 ...
随机推荐
- 听说”双11”是这么解决线上bug的
听说"双11"是这么解决线上bug的 --Android线上热修复的使用与原理 预备知识和开发环境 Android NDK编程 AndFix浅析 Android线上热修复的原理大同 ...
- [React Native] Reduce Long Import Statements in React Native with Absolute Imports
In large React Native projects, it’s common to have long relative import paths like: import MyCompon ...
- 3D语音天气球(源代码分享)——通过天气服务动态创建3D球
转载请注明本文出自大苞米的博客(http://blog.csdn.net/a396901990),谢谢支持! 开篇废话: 这个项目准备分四部分介绍: 一:创建可旋转的"3D球":3 ...
- Object对象具体解释(二)之clone
clone方法会返回该实例对象的一个副本,通常情况下x.clone() != x || x.clone().getClass() == x.getClass() || x.clone().equals ...
- android 构建GPS Provide步骤及信息
构建GPS Provide步骤及信息 1. 用GPS2GoogleEarth录制KML文件 2. KML文件 2.1 经纬度解析 <LineString> <coordina ...
- php gd
imagecopy() 函数用于拷贝图像或图像的一部分. imagecopyresized() 函数用于拷贝部分图像并调整大小. imagecopy() imagecopy() 函数用于拷贝图像或图像 ...
- IPv6通讯原理(1) - 不能忽略的网卡启动过程
本文主题:通过抓包分析,深入观察网卡启动过程的每个步骤,从而逐步掌握通讯原理.
- BZOJ 3065 替罪羊树+动态开节点线段树
思路: RT 可以看VFK的题解 我写了半天拍了半天... 不过是$nlog^2n$的 要写垃圾回收的 线段树 如果某个节点的sum是0 也可以free掉 //By SiriusRen #inclu ...
- QlikSense系列(4)——QlikSense管理
QlikSense管理主要通过QMC界面,在安装成功后,首先需要导入用户,QlikSense本身不能创建和验证用户,只能借助第三方系统, 笔者只使用过Windows账户和AD域用户: 1.Window ...
- NSLayoutConstraint的使用
*一切皆代码*- -- #继承关系框架|类|类:-:|:-:|:-:UIKit|NSLayoutConstraint|--|-|- #应用场景UI界面的搭建一般会占用项目开发相当一部分的时间.涉及到控 ...