Scene Text Detection(场景文本检测)论文思路总结

任意角度的场景文本检测
论文思路总结
共同点：重新添加分支的创新更突出
场景文本检测
基于分割的检测方法
spcnet(mask_rcnn+tcm+rescore)
psenet(渐进扩展)
mask text spottor(新加分割分支)
craft
incepText

基于回归的检测方法：
r2cnn(类别分支，水平分支，倾斜分支)
rrpn(旋转rpn)
textbox(ssd)
textbox++
sstd(tcm改进前身)
rtn
ctpn(微分)

基于分割和回归的混合方法：
spcnet
利用mask_rcnn来进行实例分割，通过新模块tcm（获取全局语义分割图）以及rescore来提升准确率，实例分割映射在全局语义分割打分
pixel-anchor(deeplabv3+ssd):
分割的部分检测中大目标，ssd检测小目标
east(deeplabv3)
af-rpn
位于文本核心区域中的每个滑动点，直接预测从它到文本边框顶点的偏移量
(采用ohem)

FPN官方给的训练时候是前面共享参数的，对结果影响不大，说是特征金字塔使得不同层学到了相同层次的语义特征
FPN在得到多层金字塔模块的proposals结果之后，放到一块做nms处理
FPN每层金字塔模块的scale都是一样的，因为对应到不同的feature map上面刚好检测不同大小的物体

***********************论文名字后边括号内容为亮点部分********************

hybrid:---------------------------------------------------------------
1.af-rpn(af)
anchor-free
直接预测中心点到box的四个顶点偏移量，
避免了这种情况（to achieve high recall, anchors use various scales and shapes should be designed to cover the scale and shape variabilities of objects ）
scale-friendly
FPN对大中小三种尺度的目标分开检测（实现细节与fpn有不同）

2.inceptext(inceptext)
整体就是 fpn+inception_module+deformable_conv+deformable PSROI pooling
inception-text
设计类似inception中(1*1，3*3，5*5)三种卷积核对大中小三种不同尺度的目标进行检测，
也加入deformable卷积来调整感受野,把检测聚集到文字上面，不容易受方向限制；还有 two fused feature maps 增加多尺度信息。
deformable psroi pooling
(把检测聚集到文字上面，不容易受方向限制)
加入offset集中检测文字部分的信息，tend to learn the context surrounding the text
Each image is randomly cropped and scaled to have short edge of{640,800,960,1120}.
The anchor scales are {2,4,8,16}, and ratios are {0.2,0.5,2,5}.

3.rtn(无亮点)
一个多尺度的特征，加上ctpn竖直框，加上只有回归的预测
hierarchical convolutional
获得更强的语义特征，融合了resnet的模块4和模块5
vertical proposal mechanism
用ctpn获取竖直框，目的是去掉proposal的分类

regression:---------------------------------------------------------------
1.ctpn
detecting text in ﬁne-scale proposals
generate vertical proposals
recurrent connectionist text proposals
连接vertical proposals
side-reﬁnement
针对左右边界的anchors预测文本行的边界进行调整
2.textboxs
采用ssd来做std(multi-scale)
3.textboxs++
可以借鉴数据增强的方式 random crop
4.r2cnn(inclined box)
three ROIPoolings use different pooled sizes
anchor scales(4,8,16,32)
axis-aligned 和 inclined box一起预测且是包含关系
incline NMS
compute convolutional feature maps on an image pyramid(非主要)
augment ICDAR 2015
We rotate our image at the following angles (-90, -75, -60, -45, -30, -15, 0, 15, 30, 45, 60, 75, 90).
借鉴r2cnn的 ablation experiment
5.rrpn
rrpn
r-anchors(54,3*3*6),generate inclined proposals(representation,x,y,h,w,θ)
RROI pooling
skew NMS
image rotation strategy during data augmentation

segmentation ------------------------------------------------------

Scene Text Detection(场景文本检测)论文思路总结的更多相关文章

论文阅读（Xiang Bai——【arXiv2016】Scene Text Detection via Holistic, Multi-Channel Prediction）
Xiang Bai--[arXiv2016]Scene Text Detection via Holistic, Multi-Channel Prediction 目录作者和相关链接方法概括创新 ...
论文阅读（Weilin Huang——【TIP2016】Text-Attentional Convolutional Neural Network for Scene Text Detection）
Weilin Huang--[TIP2015]Text-Attentional Convolutional Neural Network for Scene Text Detection) 目录作者 ...
论文速读（Chuhui Xue——【arxiv2019】MSR_Multi-Scale Shape Regression for Scene Text Detection）
Chuhui Xue--[arxiv2019]MSR_Multi-Scale Shape Regression for Scene Text Detection 论文 Chuhui Xue--[arx ...
【论文速读】XiangBai_CVPR2018_Rotation-Sensitive Regression for Oriented Scene Text Detection
XiangBai_CVPR2018_Rotation-Sensitive Regression for Oriented Scene Text Detection 作者和代码 caffe代码关键词 ...
【论文速读】Chuhui Xue_ECCV2018_Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping
Chuhui Xue_ECCV2018_Accurate Scene Text Detection through Border Semantics Awareness and Bootstrappi ...
论文阅读笔记三：R2CNN：Rotational Region CNN for Orientation Robust Scene Text Detection(CVPR2017)
进行文本的检测的学习,开始使用的是ctpn网络,由于ctpn只能检测水平的文字,而对场景图片中倾斜的文本无法进行很好的检测,故将网络换为RRCNN(全称如题).小白一枚,这里就将RRCNN的论文拿来拜 ...
Learning Markov Clustering Networks for Scene Text Detection
Learning Markov Clustering Networks for Scene Text Detection 论文下载:https://arxiv.org/pdf/1805.08365v1 ...
XiangBai——【CVPR2018】Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation
XiangBai——[CVPR2018]Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentat ...
【OCR技术系列之五】自然场景文本检测技术综述（CTPN, SegLink, EAST）
文字识别分为两个具体步骤:文字的检测和文字的识别,两者缺一不可,尤其是文字检测,是识别的前提条件,若文字都找不到,那何谈文字识别.今天我们首先来谈一下当今流行的文字检测技术有哪些. 文本检测不是一件简 ...

随机推荐

linux vim设置和快捷命令配置
1.vim配置 set tabstop= set shiftwidth= set softtabstop= set fileencodings=utf-,ucs-bom,gb2312,gbk,gb18 ...
Delphi 清理程序内存
procedure ClearMemory;begin if Win32Platform = VER_PLATFORM_WIN32_NT then begin ...
攻防世界 | level0
先反编译 : int __cdecl main(int argc, const char **argv, const char **envp) { write(1, "Hello, Worl ...
使用python执行sql语句和外键解析
一.下载并导入pymysql pip install pymysql && import pymysql db=pymysql.connect(host=) #如果报错host大概率因 ...
用notepad++ 打造轻量级Java编译器
http://blog.163.com/jackie_howe/blog/static/19949134720125591752396/ 用notepad++ 打造轻量级Java编译器 2012-06 ...
VUEJS(vuejs) 数组数据不及时刷新
在Vue对象中的methods属性中构建一个方法用于刷新data使用Vue.set方法进行手动刷新methods:{update:function(o){Vue.set(this,'list',o); ...
【题解】4879. 【NOIP2016提高A组集训第11场11.9】少女觉
Description 在幽暗的地灵殿中,居住着一位少女,名为古明地觉.据说,从来没有人敢踏入过那座地灵殿,因为人们恐惧于觉一族拥有的能力——读心.掌控人心者,可控天下. 咳咳.人的记忆可以被描述为一 ...
css中word-break、word-wrap和white-space的区别
css中word-break.word-wrap和white-space的区别 :https://baijiahao.baidu.com/s?id=1578623236521030997&wf ...
bfs（标记整个棋盘）
1004 四子连棋时间限制: 1 s 空间限制: 128000 KB 题目等级 : 黄金 Gold 题目描述 Description 在一个4*4的棋盘上摆放了14颗棋子,其中有7颗白色 ...
BZOJ 3262(Treap+树状数组）
题面传送门分析分三维考虑对第一维,直接排序对第二维和第三维,我们这样考虑朴素的方法是建k棵Treap,第i棵Treap里存第二维值为k的第三维数值每次查询一组(a,b,c),只要在1~b ...

Scene Text Detection(场景文本检测)论文思路总结

Scene Text Detection(场景文本检测)论文思路总结的更多相关文章

随机推荐

热门专题