An intriguing failing of convolutional neural networks and the CoordConv solution

NeurIPS 2018

2019-10-10 15:01:48

Paper: https://arxiv.org/pdf/1807.03247.pdf

Official TensorFlow Code: https://github.com/uber-research/coordconv

Unofficial PyTorch Code: https://github.com/walsvid/CoordConv

机器之心：卷积神经网络「失陷」，CoordConv 来填坑: https://zhuanlan.zhihu.com/p/39665894

39919038

要拯救CNN的CoordConv受嘲讽，翻译个坐标还用训练? https://zhuanlan.zhihu.com/p/39841356

1. 给定 feature map and 坐标（x, y）如何生成对应的 relative CoordinateMap？

The following code is from: [ICCV19] AdaptIS: Adaptive Instance Selection Network [Github]

    def get_instances_maps(self, F, points, adaptive_input, controller_input):

        if isinstance(points, mx.nd.NDArray):

            self.num_points = points.shape[1]

        if getattr(self.controller_net, 'return_map', False):

            w = self.eqf(controller_input, points)

        else:

            w = self.eqf(controller_input, points)

            w = self.controller_net(w)

        points = F.reshape(points, shape=(-1, 2))

        x = F.repeat(adaptive_input, self.num_points, axis=0)

        x = self.add_coord_features(x, points)

        x = self.block0(x)

        x = self.adain(x, w)

        x = self.block1(x)

        return x

class AppendCoordFeatures(gluon.HybridBlock):

    def __init__(self, norm_radius, append_dist=True, spatial_scale=1.0):

        super(AppendCoordFeatures, self).__init__()

        self.xs = None

        self.spatial_scale = spatial_scale

        self.norm_radius = norm_radius

        self.append_dist = append_dist

    def _ctx_kwarg(self, x):

        if isinstance(x, mx.nd.NDArray):

            return {"ctx": x.context}

        return {}

    def get_coord_features(self, F, points, rows, cols, batch_size, **ctx_kwarg):

        row_array = F.arange(start=0, stop=rows, step=1, **ctx_kwarg)

        col_array = F.arange(start=0, stop=cols, step=1, **ctx_kwarg)

        coord_rows = F.repeat(F.reshape(row_array, (1, 1, rows, 1)), repeats=cols, axis=3)

        coord_cols = F.repeat(F.reshape(col_array, (1, 1, 1, cols)), repeats=rows, axis=2)

        coord_rows = F.repeat(coord_rows, repeats=batch_size, axis=0)

        coord_cols = F.repeat(coord_cols, repeats=batch_size, axis=0)

        coords = F.concat(coord_rows, coord_cols, dim=1)

        add_xy = F.reshape(points * self.spatial_scale, shape=(0, 0, 1))

        add_xy = F.reshape(F.repeat(add_xy, rows * cols, axis=2),

                           shape=(0, 0, rows, cols))

        coords = (coords - add_xy) / (self.norm_radius * self.spatial_scale)

        if self.append_dist:

            dist = F.sqrt(F.sum(F.square(coords), axis=1, keepdims=1))

            coord_features = F.concat(coords, dist, dim=1)

        else:

            coord_features = coords

        coord_features = F.clip(coord_features, a_min=-1, a_max=1)

        return coord_features

    def hybrid_forward(self, F, x, coords):

        if isinstance(x, mx.nd.NDArray):

            self.xs = x.shape

        batch_size, rows, cols = self.xs[0], self.xs[2], self.xs[3]

        coord_features = self.get_coord_features(F, coords, rows, cols, batch_size, **self._ctx_kwarg(x))

        return F.concat(coord_features, x, dim=1)

    def get_coord_features(self, F, points, rows, cols, batch_size, **ctx_kwarg):

        # (Pdb) points, rows, cols, batch_size

        # ([[61. 71.]] <NDArray 1x2 @gpu(0)>, 96, 96, 1)        

        # row_array and col_array:

        # [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.

        #  18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35.

        #  36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53.

        #  54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71.

        #  72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89.

        #  90. 91. 92. 93. 94. 95.]

        # <NDArray 96 @gpu(0)>

        # (Pdb) coord_rows

        # [[[[ 0.  0.  0. ...  0.  0.  0.]

        #    [ 1.  1.  1. ...  1.  1.  1.]

        #    [ 2.  2.  2. ...  2.  2.  2.]

        #    ...

        #    [93. 93. 93. ... 93. 93. 93.]

        #    [94. 94. 94. ... 94. 94. 94.]

        #    [95. 95. 95. ... 95. 95. 95.]]]]

        # <NDArray 1x1x96x96 @gpu(0)>

        # (Pdb) coord_cols

        # [[[[ 0.  1.  2. ... 93. 94. 95.]

        #    [ 0.  1.  2. ... 93. 94. 95.]

        #    [ 0.  1.  2. ... 93. 94. 95.]

        #    ...

        #    [ 0.  1.  2. ... 93. 94. 95.]

        #    [ 0.  1.  2. ... 93. 94. 95.]

        #    [ 0.  1.  2. ... 93. 94. 95.]]]]

        # <NDArray 1x1x96x96 @gpu(0)>        

        # (Pdb) add_xy

        # [[[[61. 61. 61. ... 61. 61. 61.]

        #    [61. 61. 61. ... 61. 61. 61.]

        #    [61. 61. 61. ... 61. 61. 61.]

        #    ...

        #    [61. 61. 61. ... 61. 61. 61.]

        #    [61. 61. 61. ... 61. 61. 61.]

        #    [61. 61. 61. ... 61. 61. 61.]]

        #   [[71. 71. 71. ... 71. 71. 71.]

        #    [71. 71. 71. ... 71. 71. 71.]

        #    [71. 71. 71. ... 71. 71. 71.]

        #    ...

        #    [71. 71. 71. ... 71. 71. 71.]

        #    [71. 71. 71. ... 71. 71. 71.]

        #    [71. 71. 71. ... 71. 71. 71.]]]]

        # <NDArray 1x2x96x96 @gpu(0)>    

        # (Pdb) if self.append_dist, then coord_features is:

        # [[[[-1.         -1.         -1.         ... -1.         -1.

        #     -1.        ]

        #    [-1.         -1.         -1.         ... -1.         -1.

        #     -1.        ]

        #    [-1.         -1.         -1.         ... -1.         -1.

        #     -1.        ]

        #    ...

        #    [ 0.7619048   0.7619048   0.7619048  ...  0.7619048   0.7619048

        #      0.7619048 ]

        #    [ 0.78571427  0.78571427  0.78571427 ...  0.78571427  0.78571427

        #      0.78571427]

        #    [ 0.8095238   0.8095238   0.8095238  ...  0.8095238   0.8095238

        #      0.8095238 ]]

        #   [[-1.         -1.         -1.         ...  0.52380955  0.54761904

        #      0.5714286 ]

        #    [-1.         -1.         -1.         ...  0.52380955  0.54761904

        #      0.5714286 ]

        #    [-1.         -1.         -1.         ...  0.52380955  0.54761904

        #      0.5714286 ]

        #    ...

        #    [-1.         -1.         -1.         ...  0.52380955  0.54761904

        #      0.5714286 ]

        #    [-1.         -1.         -1.         ...  0.52380955  0.54761904

        #      0.5714286 ]

        #    [-1.         -1.         -1.         ...  0.52380955  0.54761904

        #      0.5714286 ]]

        #   [[ 1.          1.          1.         ...  1.          1.

        #      1.        ]

        #    [ 1.          1.          1.         ...  1.          1.

        #      1.        ]

        #    [ 1.          1.          1.         ...  1.          1.

        #      1.        ]

        #    ...

        #    [ 1.          1.          1.         ...  0.9245947   0.9382886

        #      0.95238096]

        #    [ 1.          1.          1.         ...  0.944311    0.9577231

        #      0.9715336 ]

        #    [ 1.          1.          1.         ...  0.96421224  0.97735125

        #      0.99088824]]]]

        # <NDArray 1x3x96x96 @gpu(0)>

        pdb.set_trace()

        row_array = F.arange(start=0, stop=rows, step=1, **ctx_kwarg)   ## (96,)

        col_array = F.arange(start=0, stop=cols, step=1, **ctx_kwarg)   ## (96,)

        coord_rows = F.repeat(F.reshape(row_array, (1, 1, rows, 1)), repeats=cols, axis=3)

        coord_cols = F.repeat(F.reshape(col_array, (1, 1, 1, cols)), repeats=rows, axis=2)

        coord_rows = F.repeat(coord_rows, repeats=batch_size, axis=0)

        coord_cols = F.repeat(coord_cols, repeats=batch_size, axis=0)

        coords = F.concat(coord_rows, coord_cols, dim=1)    ## (1, 2, 96, 96) 

        add_xy = F.reshape(points * self.spatial_scale, shape=(0, 0, 1))    ## [[[61.] [71.]]] <NDArray 1x2x1 @gpu(0)>

        add_xy = F.reshape(F.repeat(add_xy, rows * cols, axis=2), shape=(0, 0, rows, cols))

        ## self.norm_radius: 42

        coords = (coords - add_xy) / (self.norm_radius * self.spatial_scale)    ## <NDArray 1x2x96x96 @gpu(0)>

        if self.append_dist:

            dist = F.sqrt(F.sum(F.square(coords), axis=1, keepdims=1))  ## <NDArray 1x1x96x96 @gpu(0)>

            coord_features = F.concat(coords, dist, dim=1)

        else:

            coord_features = coords

        coord_features = F.clip(coord_features, a_min=-1, a_max=1)

        return coord_features

I also write one PyTorch version according to the MXNet version:

class AddCoords(nn.Module):

    def __init__(self, ):

        super().__init__() 

    def forward(self, input_tensor, points):

        _, x_dim, y_dim = input_tensor.size()

        batch_size = 1 

        xx_channel = torch.arange(x_dim).repeat(1, y_dim, 1)    ## torch.Size([1, 9, 9])

        yy_channel = torch.arange(y_dim).repeat(1, x_dim, 1).transpose(1, 2)    ## torch.Size([1, 9, 9]) 

        xx_channel = xx_channel.repeat(batch_size, 1, 1, 1).transpose(2, 3)

        yy_channel = yy_channel.repeat(batch_size, 1, 1, 1).transpose(2, 3)

        coords = torch.cat((xx_channel, yy_channel), dim=1)     ## torch.Size([20, 2, 9, 9])

        coords = coords.type(torch.FloatTensor)

        add_xy = torch.reshape(points, (1, 2, 1))   ## torch.Size([1, 2, 1])

        add_xy_ = add_xy.repeat(1, 1, x_dim * y_dim)  ## torch.Size([1, 2, 81])

        add_xy_ = torch.reshape(add_xy_, (1, 2, x_dim, y_dim))  ## torch.Size([1, 2, 9, 9])

        add_xy_ = add_xy_.type(torch.FloatTensor)

        coords = (coords - add_xy_)     ## torch.Size([1, 2, 9, 9])

        coord_features = np.clip(np.array(coords), -1, 1)   ## (1, 2, 9, 9)

        coord_features = torch.from_numpy(coord_features).cuda() 

        return coord_features

An intriguing failing of convolutional neural networks and the CoordConv solution的更多相关文章

Understanding the Effective Receptive Field in Deep Convolutional Neural Networks
Understanding the Effective Receptive Field in Deep Convolutional Neural Networks 理解深度卷积神经网络中的有效感受野 ...
Deep learning_CNN_Review：A Survey of the Recent Architectures of Deep Convolutional Neural Networks——2019
CNN综述文章的翻译 [2019 CVPR] A Survey of the Recent Architectures of Deep Convolutional Neural Networks 翻 ...
tensorfolw配置过程中遇到的一些问题及其解决过程的记录（配置SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving）
今天看到一篇关于检测的论文<SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real- ...
Notes on Convolutional Neural Networks
这是Jake Bouvrie在2006年写的关于CNN的训练原理,虽然文献老了点,不过对理解经典CNN的训练过程还是很有帮助的.该作者是剑桥的研究认知科学的.翻译如有不对之处,还望告知,我好及时改正, ...
《ImageNet Classification with Deep Convolutional Neural Networks》剖析
<ImageNet Classification with Deep Convolutional Neural Networks> 剖析 CNN 领域的经典之作, 作者训练了一个面向数量为 ...
卷积神经网络CNN(Convolutional Neural Networks)没有原理只有实现
零.说明: 本文的所有代码均可在 DML 找到,欢迎点星星. 注.CNN的这份代码非常慢,基本上没有实际使用的可能,所以我只是发出来,代表我还是实践过而已一.引入: CNN这个模型实在是有些年份了, ...
A Beginner's Guide To Understanding Convolutional Neural Networks(转)
A Beginner's Guide To Understanding Convolutional Neural Networks Introduction Convolutional neural ...
阅读笔记 The Impact of Imbalanced Training Data for Convolutional Neural Networks [DegreeProject2015] 数据分析型
The Impact of Imbalanced Training Data for Convolutional Neural Networks Paulina Hensman and David M ...
读convolutional Neural Networks Applied to House Numbers Digit Classification 的收获。
本文以下内容来自读论文以后认为有价值的地方,论文来自:convolutional Neural Networks Applied to House Numbers Digit Classificati ...

随机推荐

聊聊webpack 4
前言 hello,小伙伴们,本篇仓库出至于我的GitHub仓库 web-study ,如果你觉得对你有帮助的话欢迎star,你们的点赞是我持续更新的动力 web-study webpack 打包工具 ...
windows 2003 windows 2008 windows 2012 导出域控hash的方法
quarkspwdump作者介绍的用法: 1. Windows 2008 Microsoft recently implements VSS (Volume Shadow Copy Ser ...
The server time zone value '�й��׼ʱ��' is unrecognized or represents more than one time zone 。
The server time zone value '�й��׼ʱ��' is unrecognized or represents more than one time zone. 今天有Mys ...
《linux就该这么学》课堂笔记11 LVM、防火墙初识
1.常用的LVM部署命令功能/命令物理卷管理卷组管理逻辑卷管理扫描 pvscan vgscan lvscan 建立 pvcreate vgcreate lvcreate 显示 pvdispl ...
Nginx与多版本Php配置
这次忍住没爆粗口,但真的,通过rpm包,yum安全的php-fpm,让我无言以对. 一个Php程序代码,到处测试,显示的菜单都OK,但独独在正式服务器的php-fpm下,少了很多菜单, 不知道是肿么回 ...
项目Beta冲刺(团队)--5/7
课程名称:软件工程1916|W(福州大学) 作业要求:项目Beta冲刺团队名称:葫芦娃队作业目标:进行新一轮的项目冲刺,尽力完成并完善项目团队博客队员学号队员昵称博客地址 04160242 ...
destoon开发笔记-JQ+JS实现倒计时功能
页面代码 <div class="time " class="" id="onBidtime125" pid="125&qu ...
题解：[HAOI2008]下落的圆盘
时空限制:1000ms / 128MB 原题链接: 洛谷 bzoj Description 有n个圆盘从天而降,后面落下的可以盖住前面的.求最后形成的封闭区域的周长.看下面这副图, 所有的红色线条的 ...
linux系统时区问题
1. centos 7 转载自:https://www.cnblogs.com/zhangeamon/p/5500744.html 查看时区: timedatectl $timedatectl sta ...
利用pil库处理图像
1关于PIL PIL(Python Image Library)是python的第三方图像处理库,但是由于其强大的功能与众多的使用人数,几乎已经被认为是python官方图像处理库了. 2PIL的主要功 ...

An intriguing failing of convolutional neural networks and the CoordConv solution

Uber提出CoordConv：解决普通CNN坐标变换问题: https://zhuanlan.zhihu.com/p/39919038

要拯救CNN的CoordConv受嘲讽，翻译个坐标还用训练? https://zhuanlan.zhihu.com/p/39841356

An intriguing failing of convolutional neural networks and the CoordConv solution的更多相关文章

随机推荐

热门专题