Training Deep Neural Networks
http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html //转载于
Training Deep Neural Networks
Tutorials
Popular Training Approaches of DNNs — A Quick Overview
Activation functions
Rectified linear units improve restricted boltzmann machines (ReLU)
Rectifier Nonlinearities Improve Neural Network Acoustic Models (leaky-ReLU, aka LReLU)
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (PReLU)
- keywords: PReLU, Caffe “msra” weights initilization
- arXiv: http://arxiv.org/abs/1502.01852
Empirical Evaluation of Rectified Activations in Convolutional Network (ReLU/LReLU/PReLU/RReLU)
Deep Learning with S-shaped Rectified Linear Activation Units (SReLU)
Parametric Activation Pools greatly increase performance and consistency in ConvNets
Noisy Activation Functions
Weights Initialization
An Explanation of Xavier Initialization
Deep Neural Networks with Random Gaussian Weights: A Universal Classification Strategy?
All you need is a good init
Data-dependent Initializations of Convolutional Neural Networks
What are good initial weights in a neural network?
- stackexchange: http://stats.stackexchange.com/questions/47590/what-are-good-initial-weights-in-a-neural-network
RandomOut: Using a convolutional gradient norm to win The Filter Lottery
Batch Normalization
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift(ImageNet top-5 error: 4.82%)
- arXiv: http://arxiv.org/abs/1502.03167
- blog: https://standardfrancis.wordpress.com/2015/04/16/batch-normalization/
- notes: http://blog.csdn.net/happynear/article/details/44238541
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
- arxiv: http://arxiv.org/abs/1602.07868
- github(Lasagne): https://github.com/TimSalimans/weight_norm
- notes: http://www.erogol.com/my-notes-weight-normalization/
Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks
Loss Function
The Loss Surfaces of Multilayer Networks
Optimization Methods
On Optimization Methods for Deep Learning
On the importance of initialization and momentum in deep learning
Invariant backpropagation: how to train a transformation-invariant neural network
A practical theory for designing very deep convolutional neural network
- kaggle: https://www.kaggle.com/c/datasciencebowl/forums/t/13166/happy-lantern-festival-report-and-code/69284
- paper: https://kaggle2.blob.core.windows.net/forum-message-attachments/69182/2287/A%20practical%20theory%20for%20designing%20very%20deep%20convolutional%20neural%20networks.pdf?sv=2012-02-12&se=2015-12-05T15%3A40%3A02Z&sr=b&sp=r&sig=kfBQKduA1pDtu837Y9Iqyrp2VYItTV0HCgOeOok9E3E%3D
- slides: http://vdisk.weibo.com/s/3nFsznjLKn
Stochastic Optimization Techniques
- intro: SGD/Momentum/NAG/Adagrad/RMSProp/Adadelta/Adam/ESGD/Adasecant/vSGD/Rprop
- blog: http://colinraffel.com/wiki/stochastic_optimization_techniques
Alec Radford’s animations for optimization algorithms
http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html
Faster Asynchronous SGD (FASGD)
An overview of gradient descent optimization algorithms (★★★★★)
Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters
Writing fast asynchronous SGD/AdaGrad with RcppParallel
Regularization
DisturbLabel: Regularizing CNN on the Loss Layer [University of California & MSR] (2016)
- intro: “an extremely simple algorithm which randomly replaces a part of labels as incorrect values in each iteration”
- paper: http://research.microsoft.com/en-us/um/people/jingdw/pubs/cvpr16-disturblabel.pdf
Dropout
Improving neural networks by preventing co-adaptation of feature detectors (Dropout)
Regularization of Neural Networks using DropConnect
- homepage: http://cs.nyu.edu/~wanli/dropc/
- gitxiv: http://gitxiv.com/posts/rJucpiQiDhQ7HkZoX/regularization-of-neural-networks-using-dropconnect
- github: https://github.com/iassael/torch-dropconnect
Regularizing neural networks with dropout and with DropConnect
Fast dropout training
- paper: http://jmlr.org/proceedings/papers/v28/wang13a.pdf
- github: https://github.com/sidaw/fastdropout
Dropout as data augmentation
- paper: http://arxiv.org/abs/1506.08700
- notes: https://www.evernote.com/shard/s189/sh/ef0c3302-21a4-40d7-b8b4-1c65b8ebb1c9/24ff553fcfb70a27d61ff003df75b5a9
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
Improved Dropout for Shallow and Deep Learning
Gradient Descent
Fitting a model via closed-form equations vs. Gradient Descent vs Stochastic Gradient Descent vs Mini-Batch Learning. What is the difference?(Normal Equations vs. GD vs. SGD vs. MB-GD)
http://sebastianraschka.com/faq/docs/closed-form-vs-gd.html
An Introduction to Gradient Descent in Python
Train faster, generalize better: Stability of stochastic gradient descent
A Variational Analysis of Stochastic Gradient Algorithms
The vanishing gradient problem: Oh no — an obstacle to deep learning!
Gradient Descent For Machine Learning
http://machinelearningmastery.com/gradient-descent-for-machine-learning/
Revisiting Distributed Synchronous SGD
Accelerate Training
Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices
Image Data Augmentation
DataAugmentation ver1.0: Image data augmentation tool for training of image recognition algorithm
Caffe-Data-Augmentation: a branc caffe with feature of Data Augmentation using a configurable stochastic combination of 7 data augmentation techniques
Papers
Scalable and Sustainable Deep Learning via Randomized Hashing
Tools
pastalog: Simple, realtime visualization of neural network training performance
torch-pastalog: A Torch interface for pastalog - simple, realtime visualization of neural network training performance
Training Deep Neural Networks的更多相关文章
- Training (deep) Neural Networks Part: 1
Training (deep) Neural Networks Part: 1 Nowadays training deep learning models have become extremely ...
- CVPR 2018paper: DeepDefense: Training Deep Neural Networks with Improved Robustness第一讲
前言:好久不见了,最近一直瞎忙活,博客好久都没有更新了,表示道歉.希望大家在新的一年中工作顺利,学业进步,共勉! 今天我们介绍深度神经网络的缺点:无论模型有多深,无论是卷积还是RNN,都有的问题:以图 ...
- 论文翻译:BinaryConnect: Training Deep Neural Networks with binary weights during propagations
目录 摘要 1.引言 2.BinaryConnect 2.1 +1 or -1 2.2确定性与随机性二值化 2.3 Propagations vs updates 2.4 Clipping 2.5 A ...
- 论文翻译:BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or −1
目录 摘要 引言 1.BinaryNet 符号函数 梯度计算和累积 通过离散化传播梯度 一些有用的成分 算法1 使用BinaryNet训练DNN 算法2 批量标准化转换(Ioffe和Szegedy,2 ...
- 为什么深度神经网络难以训练Why are deep neural networks hard to train?
Imagine you're an engineer who has been asked to design a computer from scratch. One day you're work ...
- This instability is a fundamental problem for gradient-based learning in deep neural networks. vanishing exploding gradient problem
The unstable gradient problem: The fundamental problem here isn't so much the vanishing gradient pro ...
- [C4] Andrew Ng - Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization
About this Course This course will teach you the "magic" of getting deep learning to work ...
- [Box] Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint
目录 概 主要内容 LSGD Box 初始化 Box for Resnet 代码 Cyr E C, Gulian M, Patel R G, et al. Robust Training and In ...
- On Explainability of Deep Neural Networks
On Explainability of Deep Neural Networks « Learning F# Functional Data Structures and Algorithms is ...
随机推荐
- access remote libvirtd
访问远程libvirtd服务因为是在一个可信环境中运行,所以可以忽略安全方面的操作,步骤如下:(1)更改libvirtd配置 1.1 更改/ect/sysconfig/libvirtd文件,打开 ...
- JSTL标签库中<c:choose></c:choose>不能放JSP页面<!-- -->注释
笔者最近在使用JSTL标签库的<c:choose>标签时候,发现在该标签体中加了JSP的<!-- -->注释时,总是会显示报错信息.错误的信息如下: <span styl ...
- 不是技术牛人,如何拿到国内IT巨头的Offer(转载)
转载的文章,中间有几段需要去学习. byvoid 面阿里星计划的面试结果截图泄漏,引起无数IT屌丝的羡慕敬仰.看看这些牛人,NOI金牌,开源社区名人,三年级开始写Basic-在跪拜之余我们不禁要想,和 ...
- 【caffe-windows】 caffe-master 之 matlab接口配置
平台环境: win10 64位 caffe-master vs2013 Matlab2016a 第一步: 打开\caffe-master\windows下的CommonSettings.props文 ...
- sql group by
group by实例 实例一 数据表: 姓名 科目 分数 张三 语文 80 张三 数学 98 张三 英语 65 李四 语文 70 李四 数学 80 李四 英语 90 期望查询结果: 姓名 语文 数学 ...
- webview中的页面兼容iphone6和6+
其实写这篇文章的本不该是我,而应该是开发ios的小伙伴,但作为一个前端,我想我还是有必要做一下记录的! 首先我想说下在iphone6或者6+中webview内嵌套的页面宽度已经不在是320px,而是3 ...
- 学习java第7天
关于继承还需要留意的是,子类中的所有构造方法都默认访问父类的无参构造,注意是无参,而且是必须的,如果父类没有无参子类就会报错.如果你不想给父类无参构造,那么在子类中加上super(),显式的调用有参构 ...
- CADisplayLink以及和NSTimer的区别
什么是CADisplayLink CADisplayLink是一个能让我们以和屏幕刷新率相同的频率将内容画到屏幕上的定时器.我们在应用中创建一个新的 CADisplayLink 对象,把它添加到一个r ...
- Java获取系统时间
Java可以通过SimpleDateFormat格式化类对Date进行格式话获取时间. import java.util.*; import java.text.*; public class Tes ...
- MyBatis关联查询,表字段相同,resultMap映射问题的解决办法
问题描述:在使用mybatis进行多表联合查询时,如果两张表中的字段名称形同,会出现无法正常映射的问题. 问题解决办法:在查询时,给重复的字段 起别名,然后在resultMap中使用别名进行映射. 给 ...