Rupesh Kumar Srivastava
Klaus Greff
 ̈
J urgen
Schmidhuber
The Swiss AI Lab IDSIA / USI / SUPSI
{rupesh, klaus, juergen}@idsia.ch

Abstract
Theoretical and empirical evidence indicates that the depth of neural networks
is crucial for their success. However, training becomes more difficult as depth
increases, and training of very deep networks remains an open problem. Here we
introduce a new architecture designed to overcome this. Our so-called highway
networks allow unimpeded information flow across many layers on information
highways. They are inspired by Long Short-Term Memory recurrent networks and
use adaptive gating units to regulate the information flow. Even with hundreds of
layers, highway networks can be trained directly through simple gradient descent.
This enables the study of extremely deep and efficient architectures.

1
Introduction & Previous Work
Many recent empirical breakthroughs in supervised machine learning have been achieved through
large and deep neural networks. Network depth (the number of successive computational layers) has
played perhaps the most important role in these successes. For instance, within just a few years, the
top-5 image classification accuracy on the 1000-class ImageNet dataset has increased from ∼84%
[1] to ∼95% [2, 3] using deeper networks with rather small receptive fields [4, 5]. Other results on
practical machine learning problems have also underscored the superiority of deeper networks [6]
in terms of accuracy and/or performance.
In fact, deep networks can represent certain function classes far more efficiently than shallow ones.
This is perhaps most obvious for recurrent nets, the deepest of them all. For example, the n bit
parity problem can in principle be learned by a large feedforward net with n binary input units, 1
output unit, and a single but large hidden layer. But the natural solution for arbitrary n is a recurrent
net with only 3 units and 5 weights, reading the input bit string one bit at a time, making a single
recurrent hidden unit flip its state whenever a new 1 is observed [7]. Related observations hold for
Boolean circuits [8, 9] and modern neural networks [10, 11, 12].

Training Very Deep Networks的更多相关文章

  1. 【论文笔记】Training Very Deep Networks - Highway Networks

    目标: 怎么训练很深的神经网络 然而过深的神经网络会造成各种问题,梯度消失之类的,导致很难训练 作者利用了类似LSTM的方法,通过增加gate来控制transform前和transform后的数据的比 ...

  2. Deep Learning 8_深度学习UFLDL教程:Stacked Autocoders and Implement deep networks for digit classification_Exercise(斯坦福大学深度学习教程)

    前言 1.理论知识:UFLDL教程.Deep learning:十六(deep networks) 2.实验环境:win7, matlab2015b,16G内存,2T硬盘 3.实验内容:Exercis ...

  3. Initialization of deep networks

    Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...

  4. 论文笔记:Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

    Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks ICML 2017 Paper:https://arxiv.org/ ...

  5. 【DeepLearning】Exercise: Implement deep networks for digit classification

    Exercise: Implement deep networks for digit classification 习题链接:Exercise: Implement deep networks fo ...

  6. 深度学习材料:从感知机到深度网络A Deep Learning Tutorial: From Perceptrons to Deep Networks

    In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyo ...

  7. Deep Networks : Overview

    Overview In the previous sections, you constructed a 3-layer neural network comprising an input, hid ...

  8. Quantization aware training 量化背后的技术——Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

    1,概述 模型量化属于模型压缩的范畴,模型压缩的目的旨在降低模型的内存大小,加速模型的推断速度(除了压缩之外,一些模型推断框架也可以通过内存,io,计算等优化来加速推断). 常见的模型压缩算法有:量化 ...

  9. Communication-Efficient Learning of Deep Networks from Decentralized Data

    郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Proceedings of the 20th International Conference on Artificial Intell ...

随机推荐

  1. [转]Script标签和脚本执行顺序

    Script标签和脚本执行顺序 这里详细聊聊和script标签相关的脚本执行顺序. Script标签的默认行为 几个首要特性: script标签(不带defer或async属性)的会阻止文档渲染.相关 ...

  2. java.lang.NoClassDefFoundError: org/apache/commons/io/output/DeferredFileOutputStream异常解决方法

    使用Tomcat部署Servlet程序时,单步调试跟踪到: List<FileItem> itemList = sfu.parseRequest(request); 总是会报错:Java. ...

  3. nginx学习资源

    在了解nginx的时候 看到的一些资源: https://www.cnblogs.com/EdwinChan/p/8350984.html http://tengine.taobao.org/book ...

  4. C# 结构体集合元素属性不可修改疑惑

    背景:用C#的人都知道结构体在C#中是值类型的,由于这个原因出现了一个有趣的问题,那就是结构体集合通过数字索引修改对应属性的值能不能影响到集合中的结构体呢?答案很多人可能会说不能,因为结构体是值类型的 ...

  5. nios 使用count binary 例程 只是led不闪

    系统id有问题的总结: 1, 复位是否正确.(特别使用拨码开关的) 2, 硬件连接是否有问题.(SDRAM的时序约束可以有,也可以没有) 3, 引脚分配是否正确.(SDRAM的dqm就错过一次) 4, ...

  6. 如何将serialport接收的字符串转换成十六进制数c#

    关注 baihe_591 baihe_591 本版等级:   #1 得分:0回复于: 2008-06-02 09:44:00 Byte[] byte=new Byte[1];byte=0xf1;por ...

  7. PHP框架 Laravel

    PHP框架 CI(CodeIgniter) http://www.codeigniter.com/ http://codeigniter.org.cn/ Laravel PHP Laravel htt ...

  8. 【技术调研】最强Node-RED初探总结

    在某个项目中需要调研下node-red的功能,我大概花了三天时间研究了相关的官方文档,写了几个Demo总结了下node-red相关的功能.如需转载,请注明出处 https://www.cnblogs. ...

  9. Pascal三角形

    Pascal算法呢,很简单,因为有了推导公式nCr,而当我们刚刚接触一个事物时,面对要解决的问题,归纳分析得到规律,再通过编程,控制流程,对象,语言,方法,属性得到我们想要的结果.如果这次不是PAsc ...

  10. bash&nbsp;shell笔记1&nbsp;脚本基础知识

    原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 .作者信息和本声明.否则将追究法律责任.http://twentyfour.blog.51cto.com/945260/505644 * ...