1. Stochastic Gradient Descent

2. SGD With Momentum

Stochastic gradient descent with momentum remembers the update Δ w at each iteration, and determines the next update as a linear combination of the gradient and the previous update:

Unlike in classical stochastic gradient descent, it tends to keep traveling in the same direction, preventing oscillations.

3. RMSProp

RMSProp (for Root Mean Square Propagation) is also a method in which the learning rate is adapted for each of the parameters. The idea is to divide the learning rate for a weight by a running average of the magnitudes of recent gradients for that weight. So, first the running average is calculated in terms of means square,

where,  is the forgetting factor.

And the parameters are updated as,

RMSProp has shown excellent adaptation of learning rate in different applications. RMSProp can be seen as a generalization of Rprop and is capable to work with mini-batches as well opposed to only full-batches.

4. The Adam Algorithm

Adam (short for Adaptive Moment Estimation) is an update to the RMSProp optimizer. In this optimization algorithm, running averages of both the gradients and the second moments of the gradients are used. Given parameters  and a loss function , where  indexes the current training iteration (indexed at ), Adam's parameter update is given by:

where  is a small number used to prevent division by 0, and  and  are the forgetting factors for gradients and second moments of gradients, respectively.

参考链接:Wikipedia

Optimization Algorithms的更多相关文章

  1. (转) An overview of gradient descent optimization algorithms

    An overview of gradient descent optimization algorithms Table of contents: Gradient descent variants ...

  2. An overview of gradient descent optimization algorithms

    原文地址:An overview of gradient descent optimization algorithms An overview of gradient descent optimiz ...

  3. 课程二(Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization),第二周(Optimization algorithms) —— 2.Programming assignments:Optimization

    Optimization Welcome to the optimization's programming assignment of the hyper-parameters tuning spe ...

  4. 优化算法动画演示Alec Radford's animations for optimization algorithms

    Alec Radford has created some great animations comparing optimization algorithms SGD, Momentum, NAG, ...

  5. [C2W2] Improving Deep Neural Networks : Optimization algorithms

    第二周:优化算法(Optimization algorithms) Mini-batch 梯度下降(Mini-batch gradient descent) 本周将学习优化算法,这能让你的神经网络运行 ...

  6. 【论文翻译】An overiview of gradient descent optimization algorithms

    这篇论文最早是一篇2016年1月16日发表在Sebastian Ruder的博客.本文主要工作是对这篇论文与李宏毅课程相关的核心部分进行翻译. 论文全文翻译: An overview of gradi ...

  7. Coursera Deep Learning 2 Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization - week2, Optimization algorithms

    Gradient descent Batch Gradient Decent, Mini-batch gradient descent, Stochastic gradient descent 还有很 ...

  8. An overview of gradient descent optimization algorithms (更新到Adam)

    Momentum:解快了收敛速度,同时也减弱了SGD的波动 NAG: 减速了Momentum更新参数太快 Adagrad: 出现频率较低参数采用较大的更新,对于出现频率较高的参数采用较小的,不共用一个 ...

  9. 最佳化常用测试函数 Optimization Test functions

    http://www.sfu.ca/~ssurjano/optimization.html The functions listed below are some of the common func ...

随机推荐

  1. (补发)学pythion的第二天

    所学知识点: if语句的使用 在Python中,要构造分支结构可以使用if.elif和else关键字.所谓关键字就是有特殊含义的单词,像if和else就是专门用于构造分支结构的关键字,很显然你不能够使 ...

  2. 【转帖】 解开龙芯与mips4000的关系

    -- 苏联给的套件,我们只要把电子管插上就好. -- 千万次机器,不晓得来源 DJS-130系列,16位小型机,仿造美国NOVA DJS-180系列,超级小型机,仿造美国DEC VAX, 能跑DEC的 ...

  3. Oracle创建表空间、创建用户,给用户分配表空间以及可操作权限

    创建表空间一共可分为四个步骤 具体脚本如下: 第1步:创建临时表空间 create temporary tablespace yd_temp       tempfile 'D:\oracledata ...

  4. 什么是分析型数据库PostgreSQL版

    分析型数据库PostgreSQL版(原HybridDB for PostgreSQL)为您提供简单.快速.经济高效的 PB 级云端数据仓库解决方案.分析型数据库PostgreSQL版 兼容 Green ...

  5. Kick Start 2019 Round H. Elevanagram

    设共有 $N = \sum_{i=1}^{9} A_i$ 个数字.先把 $N$ 个数字任意分成两组 $A$ 和 $B$,$A$ 中有 $N_A = \floor{N/2}$ 个数字,$B$ 中有 $N ...

  6. web/服务器知识

    一 PV 推到出 QPS 你想建设一个能承受500万PV/每天的网站吗? 500万PV是什么概念?服务器每秒要处理多少个请求才能应对?如果计算呢?? PV是什么:PV是page view的简写.PV是 ...

  7. PAT B1022 D进制的A+B

    课本AC代码 #include <cstdio> int main() { int a, b, d; scanf("%d%d%d", &a, &b, & ...

  8. PHP会话(Session)实现用户登陆功能

    对比起 Cookie,Session 是存储在服务器端的会话,相对安全,并且不像 Cookie 那样有存储长度限制,本文简单介绍 Session 的使用. 由于 Session 是以文本文件形式存储在 ...

  9. python中的类变量和对象变量,以及传值传引用的探究

    http://www.cnblogs.com/gtarcoder/p/5005897.html http://www.cnblogs.com/mexh/p/9967811.html

  10. Django模板(Template)系统

    Django模板系统 官方文档 常用语法 只需要记两种特殊符号: {{  }}和 {% %} 变量相关的用{{}},逻辑相关的用{%%}. 变量 {{ 变量名 }} 变量名由字母数字和下划线组成. 点 ...