Training Very Deep Networks
Rupesh Kumar Srivastava
Klaus Greff
̈
J urgen
Schmidhuber
The Swiss AI Lab IDSIA / USI / SUPSI
{rupesh, klaus, juergen}@idsia.ch
Abstract
Theoretical and empirical evidence indicates that the depth of neural networks
is crucial for their success. However, training becomes more difficult as depth
increases, and training of very deep networks remains an open problem. Here we
introduce a new architecture designed to overcome this. Our so-called highway
networks allow unimpeded information flow across many layers on information
highways. They are inspired by Long Short-Term Memory recurrent networks and
use adaptive gating units to regulate the information flow. Even with hundreds of
layers, highway networks can be trained directly through simple gradient descent.
This enables the study of extremely deep and efficient architectures.
1
Introduction & Previous Work
Many recent empirical breakthroughs in supervised machine learning have been achieved through
large and deep neural networks. Network depth (the number of successive computational layers) has
played perhaps the most important role in these successes. For instance, within just a few years, the
top-5 image classification accuracy on the 1000-class ImageNet dataset has increased from ∼84%
[1] to ∼95% [2, 3] using deeper networks with rather small receptive fields [4, 5]. Other results on
practical machine learning problems have also underscored the superiority of deeper networks [6]
in terms of accuracy and/or performance.
In fact, deep networks can represent certain function classes far more efficiently than shallow ones.
This is perhaps most obvious for recurrent nets, the deepest of them all. For example, the n bit
parity problem can in principle be learned by a large feedforward net with n binary input units, 1
output unit, and a single but large hidden layer. But the natural solution for arbitrary n is a recurrent
net with only 3 units and 5 weights, reading the input bit string one bit at a time, making a single
recurrent hidden unit flip its state whenever a new 1 is observed [7]. Related observations hold for
Boolean circuits [8, 9] and modern neural networks [10, 11, 12].
Training Very Deep Networks的更多相关文章
- 【论文笔记】Training Very Deep Networks - Highway Networks
目标: 怎么训练很深的神经网络 然而过深的神经网络会造成各种问题,梯度消失之类的,导致很难训练 作者利用了类似LSTM的方法,通过增加gate来控制transform前和transform后的数据的比 ...
- Deep Learning 8_深度学习UFLDL教程:Stacked Autocoders and Implement deep networks for digit classification_Exercise(斯坦福大学深度学习教程)
前言 1.理论知识:UFLDL教程.Deep learning:十六(deep networks) 2.实验环境:win7, matlab2015b,16G内存,2T硬盘 3.实验内容:Exercis ...
- Initialization of deep networks
Initialization of deep networks 24 Feb 2015Gustav Larsson As we all know, the solution to a non-conv ...
- 论文笔记:Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks ICML 2017 Paper:https://arxiv.org/ ...
- 【DeepLearning】Exercise: Implement deep networks for digit classification
Exercise: Implement deep networks for digit classification 习题链接:Exercise: Implement deep networks fo ...
- 深度学习材料:从感知机到深度网络A Deep Learning Tutorial: From Perceptrons to Deep Networks
In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyo ...
- Deep Networks : Overview
Overview In the previous sections, you constructed a 3-layer neural network comprising an input, hid ...
- Quantization aware training 量化背后的技术——Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
1,概述 模型量化属于模型压缩的范畴,模型压缩的目的旨在降低模型的内存大小,加速模型的推断速度(除了压缩之外,一些模型推断框架也可以通过内存,io,计算等优化来加速推断). 常见的模型压缩算法有:量化 ...
- Communication-Efficient Learning of Deep Networks from Decentralized Data
郑重声明:原文参见标题,如有侵权,请联系作者,将会撤销发布! Proceedings of the 20th International Conference on Artificial Intell ...
随机推荐
- jq添加和移除事件的方法,prop和attr
会在写条件判断的时候遇到,今天在判断没有剩余产品的时候,移除事件.当有产品的时候添加事件: 移除onClick事件: $("a").removeAttr("onclick ...
- Jmeter ----关于上传图片接口
转自:http://www.cnblogs.com/linglingyuese/p/4514808.html 需求 1 2 3 4 5 6 7 8 9 post上传 Request: { &quo ...
- form表单中name和id区别
HTML文本是由HTML命令组成的描述性文本,HTML命令可以说明文字.图形.动画.声音.表格.链接等.HTML的结构包括头部(Head).主体(Body)两大部分,其中头部描述浏览器所需的信息,而主 ...
- Maven项目中突然找不到Build Path或maven dependencies library
这两天发现有个maven项目抽风了,一个是右击项目找不到Build Path了,一个是依赖的lib库没了,maven引入的依赖包导不了.后来发现是eclipse搞的鬼,出问题的是项目下的.classp ...
- Redhat下 Apache, php, mysql的默认安装路径
apache: 如果采用RPM包安装,安装路径应在 /etc/httpd目录下 apache配置文件:/etc/httpd/conf/httpd.conf Apache模块路径:/usr/sbin/a ...
- JavaScript笔记——事件
事件一般是用于浏览器和用户操作进行交互.最早是 IE 和 Netscape Navigator 中出现, 作为分担服务器端运算负载的一种手段.直到几乎所有的浏览器都支持事件处理.而 DOM2 级规范开 ...
- 本人编写的一份前端vue面试题
说明,此题目本人自出,做过本人所在公司的前端面试题,在此共享给大家 1. 如何在vue组件中实现v-model的功能?(只需给出关键代码) 2. 简述你知道的生命周期函数和执行时机 3. 谈谈你对计算 ...
- php写一个判断是否有cookie的脚本
前言: 刚刚学习完cookie函数,写个练习. 0x01: //其实第二个应该改为elseif,但是我懒.啊哈 <?php $vlas="BnJhiFoPS4"; if(is ...
- leetcode896
class Solution { public: bool isMonotonic(vector<int>& A) { ) { return true; } bool GetDif ...
- linux之sort用法
sort命令是帮我们依据不同的数据类型进行排序,其语法及常用参数格式: sort [-bcfMnrtk][源文件][-o 输出文件] 补充说明:sort可针对文本文件的内容,以行为单位来排序. 参 数 ...