Training Very Deep Networks

Rupesh Kumar Srivastava
Klaus Greff
̈
J urgen
Schmidhuber
The Swiss AI Lab IDSIA / USI / SUPSI
{rupesh, klaus, juergen}@idsia.ch

Abstract
Theoretical and empirical evidence indicates that the depth of neural networks
is crucial for their success. However, training becomes more difficult as depth
increases, and training of very deep networks remains an open problem. Here we
introduce a new architecture designed to overcome this. Our so-called highway
networks allow unimpeded information flow across many layers on information
highways. They are inspired by Long Short-Term Memory recurrent networks and
use adaptive gating units to regulate the information flow. Even with hundreds of
layers, highway networks can be trained directly through simple gradient descent.
This enables the study of extremely deep and efficient architectures.

1
Introduction & Previous Work
Many recent empirical breakthroughs in supervised machine learning have been achieved through
large and deep neural networks. Network depth (the number of successive computational layers) has
played perhaps the most important role in these successes. For instance, within just a few years, the
top-5 image classification accuracy on the 1000-class ImageNet dataset has increased from ∼84%
[1] to ∼95% [2, 3] using deeper networks with rather small receptive fields [4, 5]. Other results on
practical machine learning problems have also underscored the superiority of deeper networks [6]
in terms of accuracy and/or performance.
In fact, deep networks can represent certain function classes far more efficiently than shallow ones.
This is perhaps most obvious for recurrent nets, the deepest of them all. For example, the n bit
parity problem can in principle be learned by a large feedforward net with n binary input units, 1
output unit, and a single but large hidden layer. But the natural solution for arbitrary n is a recurrent
net with only 3 units and 5 weights, reading the input bit string one bit at a time, making a single
recurrent hidden unit flip its state whenever a new 1 is observed [7]. Related observations hold for
Boolean circuits [8, 9] and modern neural networks [10, 11, 12].

Training Very Deep Networks的更多相关文章

【论文笔记】Training Very Deep Networks - Highway Networks
目标: 怎么训练很深的神经网络然而过深的神经网络会造成各种问题,梯度消失之类的,导致很难训练作者利用了类似LSTM的方法,通过增加gate来控制transform前和transform后的数据的比 ...
Deep Learning 8_深度学习UFLDL教程：Stacked Autocoders and Implement deep networks for digit classification_Exercise（斯坦福大学深度学习教程）
前言 1.理论知识:UFLDL教程.Deep learning:十六(deep networks) 2.实验环境:win7, matlab2015b,16G内存,2T硬盘 3.实验内容:Exercis ...
Initialization of deep networks
Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...
论文笔记：Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks ICML 2017 Paper:https://arxiv.org/ ...
【DeepLearning】Exercise: Implement deep networks for digit classification
Exercise: Implement deep networks for digit classification 习题链接:Exercise: Implement deep networks fo ...
深度学习材料：从感知机到深度网络A Deep Learning Tutorial: From Perceptrons to Deep Networks
In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyo ...
Deep Networks : Overview
Overview In the previous sections, you constructed a 3-layer neural network comprising an input, hid ...
Quantization aware training 量化背后的技术——Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
1,概述模型量化属于模型压缩的范畴,模型压缩的目的旨在降低模型的内存大小,加速模型的推断速度(除了压缩之外,一些模型推断框架也可以通过内存,io,计算等优化来加速推断). 常见的模型压缩算法有:量化 ...
Communication-Efficient Learning of Deep Networks from Decentralized Data
郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Proceedings of the 20th International Conference on Artificial Intell ...

随机推荐

Excel开发学习笔记：根据工作表worksheet内容控制按钮的状态
开发环境基于VSTO,具体配置:visual studio 2010,VB .Net,excel 2007,文档级别的定制程序. 在Ribbon工具栏中有2个功能按钮,它们是否可用取决于workshe ...
机器视觉学习笔记（5）——基于OpenCV的单目摄像机标定
本文CameraCalibrator类源代码来自于OpenCV2 计算机视觉编程手册(Robert Laganiere 著张静译) 强烈建议阅读机器视觉学习笔记(4)--单目摄像机标定参数说明之后 ...
40个Java多线程问题总结【转】
1.多线程有什么用? 一个可能在很多人看来很扯淡的一个问题:我会用多线程就好了,还管它有什么用?在我看来,这个回答更扯淡.所谓”知其然知其所以然”,”会用”只是”知其然”,”为什么用”才是”知其所以然 ...
Linux - 目录结构及文件操作
根目录 “/”:Linux 系统中最高层的目录这个就是根目录用 / 表示根目录 bin 目录:存放可执行文件 bin 目录下的文件都是平常使用的命令在 Linux 系统中,一切都是文件 sbin ...
Postgresql VACUUM COPY等
1.VACUUM VACUUM回收dead tuples占用的存储空间. 在一般的PostgreSQL操作中,被update操作删除或废弃的元组不会从物理表中删除; 它们一直存在,直到执行VACUUM ...
python---mysql 学习笔记
数据库------mysql 安装: linux----centos7: 在CentOS中默认安装有MariaDB,这个是MySQL的分支,但为了需要,还是要在系统中安装MySQL,而且安装完成之后可 ...
Unable to correct problems, you have held broken package
其实这篇接着上文(一),主要是解决samba安装的问题,中间又是一路曲折.不过这个问题也算是比较典型,有必要记录一下. #apt-get install smb* 安装失败.其实顺利的话,直接一条这样 ...
并发模型（二）——Master-Worker模式
Master-Worker模式是常用的并行模式之一,它的核心思想是,系统有两个进程协作工作:Master进程,负责接收和分配任务:Worker进程,负责处理子任务.当Worker进程将子任务处理完成后 ...
TreeView的异步延时加载
TreeView的延时加载在使用TreeView控件的时候,如果数据量太大,这个TreeView控件加载会很慢,有时甚至加载失败, 为了更好的使用TreeView控件加载大量的数据,采用异步延迟加载 ...
C#递归所以部门展示到TreeView
C#递归所以部门展示到TreeView 1.首先是数据库表的设计新建一张部门表:TestUser表 1.ID自增int主键 2.DeptName:nchar(10)3.DeptCode:nchar( ...

Training Very Deep Networks

Training Very Deep Networks的更多相关文章

随机推荐

热门专题