https://stats.stackexchange.com/questions/164876/tradeoff-batch-size-vs-number-of-iterations-to-train-a-neural-network

It has been observed in practice that when using a larger batch there is a significant degradation in the quality of the model, as measured by its ability to generalize.

https://stackoverflow.com/questions/4752626/epoch-vs-iteration-when-training-neural-networks/31842945

In the neural network terminology:

  • one epoch = one forward pass and one backward pass of all the training examples
  • batch size = the number of training examples in one forward/backward pass. The higher the batch size, the more memory space you'll need.
  • number of iterations = number of passes, each pass using [batch size] number of examples. To be clear, one pass = one forward pass + one backward pass (we do not count the forward pass and backward pass as two different passes).

Example: if you have 1000 training examples, and your batch size is 500, then it will take 2 iterations to complete 1 epoch.

http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/

Stochastic Gradient Descent (SGD) simply does away with the expectation in the update and computes the gradient of the parameters using only a single or a few training examples.

Overview

Batch methods, such as limited memory BFGS, which use the full training set to compute the next update to parameters at each iteration tend to converge very well to local optima. They are also straight forward to get working provided a good off the shelf implementation (e.g. minFunc) because they have very few hyper-parameters to tune. However, often in practice computing the cost and gradient for the entire training set can be very slow and sometimes intractable on a single machine if the dataset is too big to fit in main memory. Another issue with batch optimization methods is that they don’t give an easy way to incorporate new data in an ‘online’ setting. Stochastic Gradient Descent (SGD) addresses both of these issues by following the negative gradient of the objective after seeing only a single or a few training examples. The use of SGD In the neural network setting is motivated by the high cost of running back propagation over the full training set. SGD can overcome this cost and still lead to fast convergence.

Stochastic Gradient Descent

The standard gradient descent algorithm updates the parameters θθ of the objective J(θ)J(θ) as,

θ=θ−α∇θE[J(θ)]θ=θ−α∇θE[J(θ)]

where the expectation in the above equation is approximated by evaluating the cost and gradient over the full training set. Stochastic Gradient Descent (SGD) simply does away with the expectation in the update and computes the gradient of the parameters using only a single or a few training examples. The new update is given by,

θ=θ−α∇θJ(θ;x(i),y(i))θ=θ−α∇θJ(θ;x(i),y(i))

with a pair (x(i),y(i))(x(i),y(i)) from the training set.

What is the difference between iterations and epochs in Convolution neural networks?的更多相关文章

  1. Training (deep) Neural Networks Part: 1

    Training (deep) Neural Networks Part: 1 Nowadays training deep learning models have become extremely ...

  2. Online handwriting recognition using multi convolution neural networks

    w可以考虑从计算机的“机械性.重复性”特征去设计“低效的”算法. https://www.codeproject.com/articles/523074/webcontrols/ Online han ...

  3. (转)Understanding, generalisation, and transfer learning in deep neural networks

    Understanding, generalisation, and transfer learning in deep neural networks FEBRUARY 27, 2017   Thi ...

  4. [C6] Andrew Ng - Convolutional Neural Networks

    About this Course This course will teach you how to build convolutional neural networks and apply it ...

  5. 2017年计算语义相似度最新论文,击败了siamese lstm,非监督学习

    Page 1Published as a conference paper at ICLR 2017AS IMPLE BUT T OUGH - TO -B EAT B ASELINE FOR S EN ...

  6. FITTING A MODEL VIA CLOSED-FORM EQUATIONS VS. GRADIENT DESCENT VS STOCHASTIC GRADIENT DESCENT VS MINI-BATCH LEARNING. WHAT IS THE DIFFERENCE?

    FITTING A MODEL VIA CLOSED-FORM EQUATIONS VS. GRADIENT DESCENT VS STOCHASTIC GRADIENT DESCENT VS MIN ...

  7. (转)The AlphaGo Replication Wiki

    The AlphaGo Replication Wiki 摘自:https://github.com/Rochester-NRT/RocAlphaGo/wiki/01.-Home Contents : ...

  8. (转)分布式深度学习系统构建 简介 Distributed Deep Learning

    HOME ABOUT CONTACT SUBSCRIBE VIA RSS   DEEP LEARNING FOR ENTERPRISE Distributed Deep Learning, Part ...

  9. CIFAR-10 Competition Winners: Interviews with Dr. Ben Graham, Phil Culliton, & Zygmunt Zając

    CIFAR-10 Competition Winners: Interviews with Dr. Ben Graham, Phil Culliton, & Zygmunt Zając Dr. ...

随机推荐

  1. ant 报 make sure you have it in your classpath

    检查build.xml的配置 build.xml配置出错,导致的这个问题

  2. Kwickserver

    Kwickserver 欢迎来到Kwickserver的主页 Kwickserver是什么? Kwickserver是一个易于安装的和易于使用的服务器应用程序,从CD安装在PC兼容的硬件和坚持webi ...

  3. 【oracle11g,17】存储结构: 段的类型,数据块(行连接、行迁移,块头),段的管理方式,高水位线

    一.段的类型: 1.什么是段:段是存储单元. 1.段的类型有: 表 分区表 簇表 索引 索引组织表(IOT表) 分区索引 暂时段 undo段 lob段(blob ,clob) 内嵌表(record类型 ...

  4. Angular 学习笔记——ng-animate

    <!DOCTYPE HTML> <html ng-app="myApp"> <head> <meta http-equiv="C ...

  5. poj Muddy Fields

    Muddy Fields 原题去我创的专题里找,在文件夹首页. 题目: 给出N*M矩阵.当中*表示泥土,.表示小草.要你用最少的木板把泥土覆盖. 木板长度不限.可是仅仅能水平和竖直. 行列式二分匹配配 ...

  6. apue学习笔记(第三章 文件I/O)

    本章开始讨论UNIX系统,先说明可用的文件I/O函数---打开文件.读写文件等 UNIX系统中的大多数文件I/O只需用到5个函数:open.read.write.lseek以及close open函数 ...

  7. cisco 为每个单独的人员设置不同的用户名和密码

    cisco 为每个单独的人员设置不同的用户名和密码 2010-12-15 17:00:16 分类: 系统运维 Router1#configure terminalEnter configuration ...

  8. Theme.AppCompat.Light无法找到问题

    使用adt开发新建一个Android app.选择支持的SDK版本号假设小于11(Android3.0)就会报例如以下错误. error: Error retrieving parent for it ...

  9. Laravel 数据库实例教程 —— 使用DB门面操作数据库

    Laravel支持多种数据库,包括MySQL.Postgres.SQLite和SQL Server,在Laravel中连接数据库和查询数据库都非常简单,我们可以使用多种方式与数据库进行交互,包括原生S ...

  10. SpringSecurity---javaconfig:Hello Web Security

    © 版权声明:本文为博主原创文章,转载请注明出处 本文根据官方文档加上自己的理解,仅供参考 官方文档:https://docs.spring.io/spring-security/site/docs/ ...