http://www.cs.toronto.edu/~hinton/absps/transauto6.pdf

The artificial neural networks that are used to recognize shapes typically use one or more layers of learned feature detectors that produce scalar outputs. By contrast, the computer vision community uses complicated, hand-engineered features, like SIFT [6], that producea whole vector of outputs including an explicit representation of the pose of the feature. We show how neural networks can be used to learn features that output a whole vector of instantiation parameters and we argue that this is a much more promising way of dealing with variations in position,orientation, scale and lighting than the methods currently employed in the neural networks community. It is also more promising than the hand-engineered features currently used in computer vision because it provides an efficient way of adapting the features to the domain.

Current methods for recognizing objects in images perform poorly and use methods that are intellectually unsatisfying. Some of the best computer vision systems use histograms of oriented gradients as “visual words” and model the spatial distribution of these elements using a crude spatial pyramid. Such methods can recognize objects correctly without knowing exactly where they are – an ability that is used to diagnose brain damage in humans. The best artifical neural net-works [4, 5, 10] use hand-coded weight-sharing schemes to reduce the number of free parameters and they achieve local translational invariance by subsampling the activities of local pools of translated replicas of the same kernel. This method of dealing with the changes in images caused by changes in view point is much better than no method at all, but it is clearly incapable of dealing with recognition tasks, such as facial identity recognition, that require knowledge of the precise spatial relationships between high-level parts like a nose and a mouth.After several stages of subsampling in a convolutional net, high-level feature shave a lot of uncertainty in their poses. This is generally regarded as a desireable property because it amounts to invariance to pose over some limited range,but it makes it impossible to compute precise spatial relationships.

【This paper argues that convolutional neural networks are misguided in what they are trying to achieve.】

【capsules】

This paper argues that convolutional neural networks are misguided in what they are trying to achieve. Instead of aiming for viewpoint invariance in the activities of “neurons” that use a single scalar output to summarize the activities of a local pool of replicated feature detectors, artifical neural networks should use local “capsules” that perform some quite complicated internal computations on their inputs and then encapsulate the results of these computations into a small vector of highly informative outputs. Each capsule learns to recognize an implicitly defined visual entity over a limited domain of viewing conditions and deformations and it outputs both the probability that the entity is present within its limited domain and a set of “instantiation parameters” that may include the precise pose, lighting and deformation of the visual entity relative to an implicitly defined canonical version of that entity. When the capsule is working properly, the probability of the visual entity being present is locally invariant – it does not change as the entity moves over the manifold of possible appearances within the limited domain covered by the capsule. The instantiation parameters, however, are “equivariant” – as the viewing conditions change and the entity moves over the appearance manifold, the instantiation parameters change by a corresponding amount because they are representing the intrinsic coordinates of the entity on the appearance manifold.

Transforming Auto-encoders的更多相关文章

  1. [Python] 机器学习库资料汇总

    声明:以下内容转载自平行宇宙. Python在科学计算领域,有两个重要的扩展模块:Numpy和Scipy.其中Numpy是一个用python实现的科学计算包.包括: 一个强大的N维数组对象Array: ...

  2. python数据挖掘领域工具包

    原文:http://qxde01.blog.163.com/blog/static/67335744201368101922991/ Python在科学计算领域,有两个重要的扩展模块:Numpy和Sc ...

  3. Theano3.1-练习之初步介绍

    来自 http://deeplearning.net/tutorial/,虽然比较老了,不过觉得想系统的学习theano,所以需要从python--numpy--theano的顺序学习.这里的资料都很 ...

  4. [resource]Python机器学习库

    reference: http://qxde01.blog.163.com/blog/static/67335744201368101922991/ Python在科学计算领域,有两个重要的扩展模块: ...

  5. 机器学习——深度学习(Deep Learning)

    Deep Learning是机器学习中一个非常接近AI的领域,其动机在于建立.模拟人脑进行分析学习的神经网络,近期研究了机器学习中一些深度学习的相关知识,本文给出一些非常实用的资料和心得. Key W ...

  6. Deep Learning Tutorial - Classifying MNIST digits using Logistic Regression

    Deep Learning Tutorial 由 Montreal大学的LISA实验室所作,基于Theano的深度学习材料.Theano是一个python库,使得写深度模型更容易些,也可以在GPU上训 ...

  7. [转]Python机器学习工具箱

    原文在这里  Python在科学计算领域,有两个重要的扩展模块:Numpy和Scipy.其中Numpy是一个用python实现的科学计算包.包括: 一个强大的N维数组对象Array: 比较成熟的(广播 ...

  8. 深度学习材料:从感知机到深度网络A Deep Learning Tutorial: From Perceptrons to Deep Networks

    In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyo ...

  9. Deep Learning(4)

    四.拓展学习推荐 Deep Learning 经典阅读材料: The monograph or review paper Learning Deep Architectures for AI (Fou ...

  10. 深度学习教程Deep Learning Tutorials

    Deep Learning Tutorials Deep Learning is a new area of Machine Learning research, which has been int ...

随机推荐

  1. /sys/class/gpio 文件接口操作IO端口(s3c2440)

    http://blog.csdn.net/mirkerson/article/details/8464231 在嵌入式设备中对GPIO的操作是最基本的操作.一般的做法是写一个单独驱动程序,网上大多数的 ...

  2. Java 网络通信【01】TCP

    不积跬步,无以至千里:不积小流,无以成江海.——<荀子劝学> JAVA中设计网络编程模式的主要有TCP和UDP两种. TCP是属于即时通信,点对点连接进行通信. UDP是通过数据包来进行通 ...

  3. R语言实战读书笔记(六)基本图形

    #安装vcd包,数据集在vcd包中 library(vcd) counts <- table(Arthritis$Improved)counts # 垂直barplot(counts, main ...

  4. IntelliJ IDEA中Spring Boot项目使用spring-boot-devtools无法实现热部署/热更新的问题解决

    这个设置真的和Eclipse有很大区别,Eclipse中只要运行之后就可实现修改文件自动重启.但IDEA不太一样,需要做如下配置: 前提: 1.添加spring-boot-devtools到POM. ...

  5. Css Position定位(简易版本)

    准备前的知识: 定位只对块级起作用.如div,p等元素是块级元素,如果是内联元素则可以先变成块级元素,display:block即可. 开始讲解: 定位共四种:static,fixed,relativ ...

  6. spring aop提供了两种实现方式jdk和cglib

    Spring AOP使用了两种代理机制:一种是基于JDK的动态代理:另一种是基于CGLib的动态代理.之所以需要两种代理机制,很大程度上是因为JDK本身只提供接口的代理,而不支持类的代理. Sprin ...

  7. 借助树莓派模拟Wimonitor并实现WiFi窃听和嗅探

    Wimonitor是一款非常优秀的黑客工具,它不仅可以帮渗透测试人员省去配置虚拟机和无线网卡等一系列麻烦事,而且它的Web接口配置起来也非常的方便.实际上,它就是一款TP-Link-MR3020路由器 ...

  8. yii框架:CDbConnection failed to open the DB connection: could not find driver的解决的方法

    这个问题是由于php中缺少pdo mysql造成的. 解决方法是为php加入此扩展.前往你最早的php安装文件,进入ext/pdo_mysql/文件夹下,然后./configure --with-ph ...

  9. Data Binding Guide——google官方文档翻译(上)

    android引入MVVM框架时间还不长,眼下还非常少有应用到app中的.但它是比較新的技术,使用它来搭建项目能省非常多代码,并且能使用代码架构比較清晰.本篇文章是我在学习MVVM时翻译的.篇幅比較长 ...

  10. d3js 添加数据

    <!DOCTYPE html> <html lang="zh"> <head> <meta charset="UTF-8&quo ...