https://stats.stackexchange.com/questions/164876/tradeoff-batch-size-vs-number-of-iterations-to-train-a-neural-network

It has been observed in practice that when using a larger batch there is a significant degradation in the quality of the model, as measured by its ability to generalize.

https://stackoverflow.com/questions/4752626/epoch-vs-iteration-when-training-neural-networks/31842945

In the neural network terminology:

  • one epoch = one forward pass and one backward pass of all the training examples
  • batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need.
  • number of iterations = number of passes, each pass using [batch size] number of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes).

Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.

http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/

Stochastic Gradient Descent (SGD) simply does away with the expectation in the update and computes the gradient of the parameters using only a single or a few training examples.

Overview

Batch methods, such as limited memory BFGS, which use the full training set to compute the next update to parameters at each iteration tend to converge very well to local optima. They are also straight forward to get working provided a good off the shelf implementation (e.g. minFunc) because they have very few hyper-parameters to tune. However, often in practice computing the cost and gradient for the entire training set can be very slow and sometimes intractable on a single machine if the dataset is too big to fit in main memory. Another issue with batch optimization methods is that they don’t give an easy way to incorporate new data in an ‘online’ setting. Stochastic Gradient Descent (SGD) addresses both of these issues by following the negative gradient of the objective after seeing only a single or a few training examples. The use of SGD In the neural network setting is motivated by the high cost of running back propagation over the full training set. SGD can overcome this cost and still lead to fast convergence.

Stochastic Gradient Descent

The standard gradient descent algorithm updates the parameters θθ of the objective J(θ)J(θ) as,

θ=θ−α∇θE[J(θ)]θ=θ−α∇θE[J(θ)]

where the expectation in the above equation is approximated by evaluating the cost and gradient over the full training set. Stochastic Gradient Descent (SGD) simply does away with the expectation in the update and computes the gradient of the parameters using only a single or a few training examples. The new update is given by,

θ=θ−α∇θJ(θ;x(i),y(i))θ=θ−α∇θJ(θ;x(i),y(i))

with a pair (x(i),y(i))(x(i),y(i)) from the training set.

What is the difference between iterations and epochs in Convolution neural networks?的更多相关文章

  1. Training (deep) Neural Networks Part: 1

    Training (deep) Neural Networks Part: 1 Nowadays training deep learning models have become extremely ...

  2. Online handwriting recognition using multi convolution neural networks

    w可以考虑从计算机的“机械性.重复性”特征去设计“低效的”算法. https://www.codeproject.com/articles/523074/webcontrols/ Online han ...

  3. (转)Understanding, generalisation, and transfer learning in deep neural networks

    Understanding, generalisation, and transfer learning in deep neural networks FEBRUARY 27, 2017   Thi ...

  4. [C6] Andrew Ng - Convolutional Neural Networks

    About this Course This course will teach you how to build convolutional neural networks and apply it ...

  5. 2017年计算语义相似度最新论文,击败了siamese lstm,非监督学习

    Page 1Published as a conference paper at ICLR 2017AS IMPLE BUT T OUGH - TO -B EAT B ASELINE FOR S EN ...

  6. FITTING A MODEL VIA CLOSED-FORM EQUATIONS VS. GRADIENT DESCENT VS STOCHASTIC GRADIENT DESCENT VS MINI-BATCH LEARNING. WHAT IS THE DIFFERENCE?

    FITTING A MODEL VIA CLOSED-FORM EQUATIONS VS. GRADIENT DESCENT VS STOCHASTIC GRADIENT DESCENT VS MIN ...

  7. (转)The AlphaGo Replication Wiki

    The AlphaGo Replication Wiki 摘自:https://github.com/Rochester-NRT/RocAlphaGo/wiki/01.-Home Contents : ...

  8. (转)分布式深度学习系统构建 简介 Distributed Deep Learning

    HOME ABOUT CONTACT SUBSCRIBE VIA RSS   DEEP LEARNING FOR ENTERPRISE Distributed Deep Learning, Part ...

  9. CIFAR-10 Competition Winners: Interviews with Dr. Ben Graham, Phil Culliton, & Zygmunt Zając

    CIFAR-10 Competition Winners: Interviews with Dr. Ben Graham, Phil Culliton, & Zygmunt Zając Dr. ...

随机推荐

  1. python获取输入参数

    python获取输入参数 学习了:https://www.cnblogs.com/angelatian/p/5832448.html import sys模块: len(sys.argv)参数个数 s ...

  2. Java 分支结构 - if...else/switch

    Java 分支结构 - if...else/switch 顺序结构只能顺序执行,不能进行判断和选择,因此需要分支结构. Java 有两种分支结构: if 语句 switch 语句 if 语句 一个 i ...

  3. Android 软键盘的监听(监听高度,是否显示)

    Android官方本身没有提供一共好的方法来对软键盘进行监听,但我们实际应用时.非常多地方都须要针对软键盘来对UI进行一些优化. 下面是整理出来的一个不错的方法.大家能够使用. public clas ...

  4. Linux——环境变量的文件及配置

    环境变量是包含关于系统及当前登录用户的环境信息的字符串,一些软件程序使用此信息确定在何处放置文件(如临时文件). 一.环境变量文件介绍 转自:http://blog.csdn.net/cscmaker ...

  5. useradd(总结)

    useradd,一条简单的语句,会引起六个文件的变化 举例一: useradd sc 1.可以看到在最后一行,多了一个用户.cat /etc/passwd [有一个字段为X,代表还没有密码] 2.密码 ...

  6. Eclipse 使用 SVN 插件后改动用户方法汇总

    判定 SVN 插件是哪个 JavaH 的处理方法 SVNKit 的处理方法 工具自带改动功能 删除缓存的秘钥文件 其他发表地点 判定 SVN 插件是哪个 常见的 Eclipse SVN 插件我知道的一 ...

  7. mysql 杀掉(kill) lock进程脚本

    杀掉lock进程最快的方法是重启mysql,像你这种情况,1000多sql锁住了,最好是重启如果不允许重启,我提供一个shell脚本,生成 kill id命令杀掉lock线程,如下:--------- ...

  8. storm - 经常使用命令

    1.提交Topologies 命令格式:storm jar [jar路径] [拓扑包名.拓扑类名][stormIP地址][stormport][拓扑名称][參数] eg: storm jar /hom ...

  9. Mysql 创建权限较小的用户(只对特定数据库有操作权限)

    项目开发过程中,因为root的权限太大,可能对其他数据库造成修改.故创建一权限较小的用户,使其只能对特定的数据库操作,以保证数据安全. 主要语句如下: grant all on bos19.* to ...

  10. java 根据生日计算年龄 Java问题通用解决代码

    根据生日计算年龄可以通过Calendar实现.最简单可以考虑get(Calendar.DAY_OF_YEAR)来简单修正年龄,但是遇到生日在闰年的2月29之后,或者今年是闰年的2月29之后可能出现计算 ...