An intriguing failing of convolutional neural networks and the CoordConv solution

NeurIPS 2018

2019-10-10 15:01:48

Paperhttps://arxiv.org/pdf/1807.03247.pdf

Official TensorFlow Codehttps://github.com/uber-research/coordconv

Unofficial PyTorch Codehttps://github.com/walsvid/CoordConv

 

机器之心:卷积神经网络「失陷」,CoordConv 来填坑https://zhuanlan.zhihu.com/p/39665894

Uber提出CoordConv:解决普通CNN坐标变换问题: https://zhuanlan.zhihu.com/p/39919038

要拯救CNN的CoordConv受嘲讽,翻译个坐标还用训练? https://zhuanlan.zhihu.com/p/39841356

1. 给定 feature map and 坐标(x, y)如何生成对应的 relative CoordinateMap?

The following code is from: [ICCV19] AdaptIS: Adaptive Instance Selection Network [Github]

    def get_instances_maps(self, F, points, adaptive_input, controller_input):
if isinstance(points, mx.nd.NDArray):
self.num_points = points.shape[1] if getattr(self.controller_net, 'return_map', False):
w = self.eqf(controller_input, points)
else:
w = self.eqf(controller_input, points)
w = self.controller_net(w) points = F.reshape(points, shape=(-1, 2))
x = F.repeat(adaptive_input, self.num_points, axis=0)
x = self.add_coord_features(x, points) x = self.block0(x)
x = self.adain(x, w)
x = self.block1(x) return x
class AppendCoordFeatures(gluon.HybridBlock):
def __init__(self, norm_radius, append_dist=True, spatial_scale=1.0):
super(AppendCoordFeatures, self).__init__()
self.xs = None
self.spatial_scale = spatial_scale
self.norm_radius = norm_radius
self.append_dist = append_dist def _ctx_kwarg(self, x):
if isinstance(x, mx.nd.NDArray):
return {"ctx": x.context}
return {} def get_coord_features(self, F, points, rows, cols, batch_size, **ctx_kwarg):
row_array = F.arange(start=0, stop=rows, step=1, **ctx_kwarg)
col_array = F.arange(start=0, stop=cols, step=1, **ctx_kwarg)
coord_rows = F.repeat(F.reshape(row_array, (1, 1, rows, 1)), repeats=cols, axis=3)
coord_cols = F.repeat(F.reshape(col_array, (1, 1, 1, cols)), repeats=rows, axis=2) coord_rows = F.repeat(coord_rows, repeats=batch_size, axis=0)
coord_cols = F.repeat(coord_cols, repeats=batch_size, axis=0) coords = F.concat(coord_rows, coord_cols, dim=1) add_xy = F.reshape(points * self.spatial_scale, shape=(0, 0, 1))
add_xy = F.reshape(F.repeat(add_xy, rows * cols, axis=2),
shape=(0, 0, rows, cols)) coords = (coords - add_xy) / (self.norm_radius * self.spatial_scale)
if self.append_dist:
dist = F.sqrt(F.sum(F.square(coords), axis=1, keepdims=1))
coord_features = F.concat(coords, dist, dim=1)
else:
coord_features = coords coord_features = F.clip(coord_features, a_min=-1, a_max=1)
return coord_features def hybrid_forward(self, F, x, coords):
if isinstance(x, mx.nd.NDArray):
self.xs = x.shape batch_size, rows, cols = self.xs[0], self.xs[2], self.xs[3]
coord_features = self.get_coord_features(F, coords, rows, cols, batch_size, **self._ctx_kwarg(x)) return F.concat(coord_features, x, dim=1)
    def get_coord_features(self, F, points, rows, cols, batch_size, **ctx_kwarg):

        # (Pdb) points, rows, cols, batch_size
# ([[61. 71.]] <NDArray 1x2 @gpu(0)>, 96, 96, 1) # row_array and col_array:
# [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
# 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.
# 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.
# 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71.
# 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89.
# 90. 91. 92. 93. 94. 95.]
# <NDArray 96 @gpu(0)> # (Pdb) coord_rows
# [[[[ 0. 0. 0. ... 0. 0. 0.]
# [ 1. 1. 1. ... 1. 1. 1.]
# [ 2. 2. 2. ... 2. 2. 2.]
# ...
# [93. 93. 93. ... 93. 93. 93.]
# [94. 94. 94. ... 94. 94. 94.]
# [95. 95. 95. ... 95. 95. 95.]]]]
# <NDArray 1x1x96x96 @gpu(0)> # (Pdb) coord_cols
# [[[[ 0. 1. 2. ... 93. 94. 95.]
# [ 0. 1. 2. ... 93. 94. 95.]
# [ 0. 1. 2. ... 93. 94. 95.]
# ...
# [ 0. 1. 2. ... 93. 94. 95.]
# [ 0. 1. 2. ... 93. 94. 95.]
# [ 0. 1. 2. ... 93. 94. 95.]]]]
# <NDArray 1x1x96x96 @gpu(0)> # (Pdb) add_xy
# [[[[61. 61. 61. ... 61. 61. 61.]
# [61. 61. 61. ... 61. 61. 61.]
# [61. 61. 61. ... 61. 61. 61.]
# ...
# [61. 61. 61. ... 61. 61. 61.]
# [61. 61. 61. ... 61. 61. 61.]
# [61. 61. 61. ... 61. 61. 61.]] # [[71. 71. 71. ... 71. 71. 71.]
# [71. 71. 71. ... 71. 71. 71.]
# [71. 71. 71. ... 71. 71. 71.]
# ...
# [71. 71. 71. ... 71. 71. 71.]
# [71. 71. 71. ... 71. 71. 71.]
# [71. 71. 71. ... 71. 71. 71.]]]]
# <NDArray 1x2x96x96 @gpu(0)> # (Pdb) if self.append_dist, then coord_features is:
# [[[[-1. -1. -1. ... -1. -1.
# -1. ]
# [-1. -1. -1. ... -1. -1.
# -1. ]
# [-1. -1. -1. ... -1. -1.
# -1. ]
# ...
# [ 0.7619048 0.7619048 0.7619048 ... 0.7619048 0.7619048
# 0.7619048 ]
# [ 0.78571427 0.78571427 0.78571427 ... 0.78571427 0.78571427
# 0.78571427]
# [ 0.8095238 0.8095238 0.8095238 ... 0.8095238 0.8095238
# 0.8095238 ]] # [[-1. -1. -1. ... 0.52380955 0.54761904
# 0.5714286 ]
# [-1. -1. -1. ... 0.52380955 0.54761904
# 0.5714286 ]
# [-1. -1. -1. ... 0.52380955 0.54761904
# 0.5714286 ]
# ...
# [-1. -1. -1. ... 0.52380955 0.54761904
# 0.5714286 ]
# [-1. -1. -1. ... 0.52380955 0.54761904
# 0.5714286 ]
# [-1. -1. -1. ... 0.52380955 0.54761904
# 0.5714286 ]] # [[ 1. 1. 1. ... 1. 1.
# 1. ]
# [ 1. 1. 1. ... 1. 1.
# 1. ]
# [ 1. 1. 1. ... 1. 1.
# 1. ]
# ...
# [ 1. 1. 1. ... 0.9245947 0.9382886
# 0.95238096]
# [ 1. 1. 1. ... 0.944311 0.9577231
# 0.9715336 ]
# [ 1. 1. 1. ... 0.96421224 0.97735125
# 0.99088824]]]]
# <NDArray 1x3x96x96 @gpu(0)> pdb.set_trace()
row_array = F.arange(start=0, stop=rows, step=1, **ctx_kwarg) ## (96,)
col_array = F.arange(start=0, stop=cols, step=1, **ctx_kwarg) ## (96,)
coord_rows = F.repeat(F.reshape(row_array, (1, 1, rows, 1)), repeats=cols, axis=3)
coord_cols = F.repeat(F.reshape(col_array, (1, 1, 1, cols)), repeats=rows, axis=2) coord_rows = F.repeat(coord_rows, repeats=batch_size, axis=0)
coord_cols = F.repeat(coord_cols, repeats=batch_size, axis=0) coords = F.concat(coord_rows, coord_cols, dim=1) ## (1, 2, 96, 96) add_xy = F.reshape(points * self.spatial_scale, shape=(0, 0, 1)) ## [[[61.] [71.]]] <NDArray 1x2x1 @gpu(0)>
add_xy = F.reshape(F.repeat(add_xy, rows * cols, axis=2), shape=(0, 0, rows, cols)) ## self.norm_radius: 42
coords = (coords - add_xy) / (self.norm_radius * self.spatial_scale) ## <NDArray 1x2x96x96 @gpu(0)>
if self.append_dist:
dist = F.sqrt(F.sum(F.square(coords), axis=1, keepdims=1)) ## <NDArray 1x1x96x96 @gpu(0)>
coord_features = F.concat(coords, dist, dim=1)
else:
coord_features = coords coord_features = F.clip(coord_features, a_min=-1, a_max=1) return coord_features

I also write one PyTorch version according to the MXNet version:

class AddCoords(nn.Module):

    def __init__(self, ):
super().__init__() def forward(self, input_tensor, points):
_, x_dim, y_dim = input_tensor.size()
batch_size = 1 xx_channel = torch.arange(x_dim).repeat(1, y_dim, 1) ## torch.Size([1, 9, 9])
yy_channel = torch.arange(y_dim).repeat(1, x_dim, 1).transpose(1, 2) ## torch.Size([1, 9, 9]) xx_channel = xx_channel.repeat(batch_size, 1, 1, 1).transpose(2, 3)
yy_channel = yy_channel.repeat(batch_size, 1, 1, 1).transpose(2, 3) coords = torch.cat((xx_channel, yy_channel), dim=1) ## torch.Size([20, 2, 9, 9])
coords = coords.type(torch.FloatTensor) add_xy = torch.reshape(points, (1, 2, 1)) ## torch.Size([1, 2, 1])
add_xy_ = add_xy.repeat(1, 1, x_dim * y_dim) ## torch.Size([1, 2, 81])
add_xy_ = torch.reshape(add_xy_, (1, 2, x_dim, y_dim)) ## torch.Size([1, 2, 9, 9])
add_xy_ = add_xy_.type(torch.FloatTensor) coords = (coords - add_xy_) ## torch.Size([1, 2, 9, 9])
coord_features = np.clip(np.array(coords), -1, 1) ## (1, 2, 9, 9)
coord_features = torch.from_numpy(coord_features).cuda() return coord_features

 

An intriguing failing of convolutional neural networks and the CoordConv solution的更多相关文章

  1. Understanding the Effective Receptive Field in Deep Convolutional Neural Networks

    Understanding the Effective Receptive Field in Deep Convolutional Neural Networks 理解深度卷积神经网络中的有效感受野 ...

  2. Deep learning_CNN_Review:A Survey of the Recent Architectures of Deep Convolutional Neural Networks——2019

    CNN综述文章 的翻译 [2019 CVPR] A Survey of the Recent Architectures of Deep Convolutional Neural Networks 翻 ...

  3. tensorfolw配置过程中遇到的一些问题及其解决过程的记录(配置SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving)

    今天看到一篇关于检测的论文<SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real- ...

  4. Notes on Convolutional Neural Networks

    这是Jake Bouvrie在2006年写的关于CNN的训练原理,虽然文献老了点,不过对理解经典CNN的训练过程还是很有帮助的.该作者是剑桥的研究认知科学的.翻译如有不对之处,还望告知,我好及时改正, ...

  5. 《ImageNet Classification with Deep Convolutional Neural Networks》 剖析

    <ImageNet Classification with Deep Convolutional Neural Networks> 剖析 CNN 领域的经典之作, 作者训练了一个面向数量为 ...

  6. 卷积神经网络CNN(Convolutional Neural Networks)没有原理只有实现

    零.说明: 本文的所有代码均可在 DML 找到,欢迎点星星. 注.CNN的这份代码非常慢,基本上没有实际使用的可能,所以我只是发出来,代表我还是实践过而已 一.引入: CNN这个模型实在是有些年份了, ...

  7. A Beginner's Guide To Understanding Convolutional Neural Networks(转)

    A Beginner's Guide To Understanding Convolutional Neural Networks Introduction Convolutional neural ...

  8. 阅读笔记 The Impact of Imbalanced Training Data for Convolutional Neural Networks [DegreeProject2015] 数据分析型

    The Impact of Imbalanced Training Data for Convolutional Neural Networks Paulina Hensman and David M ...

  9. 读convolutional Neural Networks Applied to House Numbers Digit Classification 的收获。

    本文以下内容来自读论文以后认为有价值的地方,论文来自:convolutional Neural Networks Applied to House Numbers Digit Classificati ...

随机推荐

  1. 16. Promise对象

    目录 Promise对象 一.含义 1. Promise是什么 2. 实例讨论 二.Promise特性案例解析 1. Promise的立即执行性 2. promise的三种状态 3. Promise的 ...

  2. 根父类:Object 类

    一.Object类 Java中规定: 如果一个类没有显式声明它的父类(即没有写extends xx),那么默认这个类的父类就是java.lang.Object. 类 Object 是类层次结构的根类. ...

  3. MES应用案例 | 天博集团成功完成数字化转型

    受到智能制造观念和技术的巨大且快速的影响,使得工业特别是汽车行业必须以最快速度赶上这场企业数字化转型的浪潮,唯有实现企业转型升级才能在这场速度战中占得先机.当然关于企业的转型升级,最为重要的是需要打造 ...

  4. linux的bash特性

    Shell本身是应用程序,是用户与操作系统之间完成交互式操作的一个接口程序,为用户提供简化的操作. Bourne Again Shell,简称bash,是Linux系统中默认的shell程序. Bas ...

  5. pycharm 里运行 django 工程 You must either define the environment variable DJANGO_SETTINGS_MODULE 错误

    pycharm 里运行 django 工程出现错误(在命令行直接运行ok): django.core.exceptions.ImproperlyConfigured: Requested settin ...

  6. oracle 排序后分页查询

    demo: select * from ( select * from DEV_REG_CFG_CAMERA where 1 = 1 order by unid asc) where rownum & ...

  7. 利用 AWS DMS 在线迁移 MongoDB 到 Amazon Aurora

    将数据从一种数据库迁移到另一种数据库通常都非常具有挑战性,特别是考虑到数据一致性.应用停机时间.以及源和目标数据库在设计上的差异性等因素.这个过程中,运维人员通常都希望借助于专门的数据迁移(复制)工具 ...

  8. free - 显示系统内存信息

    free - Display amount of free and used memory in the system 显示系统中空闲和已使用内存的数量 格式: free [options] opti ...

  9. js动画---一个小bug的处理

    对于前面的课程,大家似乎看不出来存在什么问题,一切都很顺利,但是其实是存在一个很大的bug的,这个bug是什么呢?? 我们来看看下面这个程序就知道了 <!DOCTYPE html> < ...

  10. VoIP基本原理

    VoIP基本原理 VoIP是通过Internet等互联网络传递语音信息的,主要包括终端设备.网关.网守和网络管理等部分.网关负责提供IP网络和传统的PSTN接口. VoIP的基本原理:通过语音压缩算法 ...