一、TensorFlow中的优化器

tf.train.GradientDescentOptimizer：梯度下降算法
tf.train.AdadeltaOptimizer
tf.train.AdagradOptimizer
tf.train.MomentumOptimizer：动量梯度下降算法
tf.train.AdamOptimizer：自适应矩估计优化算法
tf.train.RMSPropOptimizer
tf.train.AdagradDAOptimizer
tf.train.FtrlOptimizer
tf.train.ProximalGradientDescentOptimizer
tf.train.ProximalAdagradOptimizertf.train.RMSProOptimizer

（1）如果数据是稀疏的，使用自适应学习方法。
（2）RMSprop，Adadelta，Adam是非常相似的优化算法，Adam的bias-correction帮助其在最后优化期间梯度变稀疏的情况下略微战胜了RMSprop。整体来讲，Adam是最好的选择。
（3）很多论文中使用vanilla SGD without momentum。SGD通常能找到最小值，但是依赖健壮的初始化，并且容易陷入鞍点。因此，如果要获得更快的收敛速度和训练更深更复杂的神经网络，需要选择自适应学习方法。

https://blog.csdn.net/winycg/article/details/79363169

二、常用的种类：

1、tf.train.Optimizer：

class tf.train.Optimizer：优化器（optimizers）类的基类。

Optimizer基类提供了计算损失梯度的方法，并将梯度应用于变量。这个类定义了在训练模型的时候添加一个操作的API。你基本上不会直接使用这个类，但是你会用到他的子类比如GradientDescentOptimizer, AdagradOptimizer, MomentumOptimizer.等等这些。

2、tf.train.GradientDescentOptimizer：梯度下降

原理：

batch GD【全部样本，速度慢】
随机GD【随机一个样本，速度快，但局部最优】
mini-batch GD 【batch个样本，常在数据量较大时使用】

训练集样本数少【≤2000】:采用batchGD

训练集样本数多：采用mini-batch GD，batch大小一般为64-512. 训练时多尝试一下2的次方来找到最合适的batch大小。

应用：

这个类是实现梯度下降算法的优化器。这个构造函数需要的一个学习率就行了。

构造函数：tf.train.GradientDescentOptimizer(0.001).minimize(loss,global_step=None,var_list=None,gate_gradients=GATE_OP,aggregation_method=None,colocate_gradients_with_ops=False,name=None,grad_loss=None)

 __init__(

     learning_rate,

     use_locking=False,

     name='GradientDescent'

 )

learning_rate: （学习率）张量或者浮点数

use_locking: 为True时锁定更新

name: 梯度下降名称，默认为"GradientDescent".

3、tf.train.AdadeltaOptimizer：

实现了 Adadelta算法的优化器，可以算是下面的Adagrad算法改进版本。

构造函数： tf.train.AdadeltaOptimizer.init(learning_rate=0.001, rho=0.95, epsilon=1e-08, use_locking=False, name=’Adadelta’)

4、tf.train.AdagradOptimizer：

构造函数：tf.train.AdagradOptimizer.__init__(learning_rate, initial_accumulator_value=0.1, use_locking=False, name=’Adagrad’)

5、tf.train.MomentumOptimizer：

原理：

momentum表示要在多大程度上保留原来的更新方向，这个值在0-1之间，在训练开始时，由于梯度可能会很大，所以初始值一般选为0.5；当梯度不那么大时，改为0.9。 α是学习率，即当前batch的梯度多大程度上影响最终更新方向，跟普通的SGD含义相同。

应用：

构造函数：tf.train.MomentumOptimizer.__init__(learning_rate, momentum, use_locking=False, name=’Momentum’, use_nesterov=False)

 __init__(

     learning_rate,

     momentum,

     use_locking=False,

     name='Momentum',

     use_nesterov=False

 )

learning_rate: （学习率）张量或者浮点数

momentum: （动量）张量或者浮点数

use_locking: 为True时锁定更新

name: 梯度下降名称，默认为 "Momentum".

use_nesterov: 为True时，使用 Nesterov Momentum.

6、tf.train.RMSPropOptimizer

目的和动量梯度一样，减小垂直方向，增大水平方向。W为水平方向，b为垂直方向。

7、tf.train.AdamOptimizer：动量和RMSProp结合

应用：

 __init__(

     learning_rate=0.001,

     beta1=0.9,

     beta2=0.999,

     epsilon=1e-08,

     use_locking=False,

     name='Adam'

 )

构造函数：tf.train.AdamOptimizer.__init__(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08, use_locking=False, name=’Adam’)

learning_rate: （学习率）张量或者浮点数，需要调试

beta1: 浮点数或者常量张量，表示 The exponential decay rate for the 1st moment estimates.【推荐使用0.9】

beta2: 浮点数或者常量张量，表示 The exponential decay rate for the 2nd moment estimates.【推荐使用0.999】

epsilon: A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.

use_locking: 为True时锁定更新

name: 梯度下降名称，默认为 "Adam".

莫烦大大TensorFlow学习笔记（8）----优化器的更多相关文章

莫烦大大TensorFlow学习笔记（9）----可视化
一.Matplotlib[结果可视化] #import os #os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' import tensorflow as tf i ...
莫烦python教程学习笔记——总结篇
一.机器学习算法分类: 监督学习:提供数据和数据分类标签.--分类.回归非监督学习:只提供数据,不提供标签. 半监督学习强化学习:尝试各种手段,自己去适应环境和规则.总结经验利用反馈,不断提高算法 ...
莫烦python教程学习笔记——保存模型、加载模型的两种方法
# View more python tutorials on my Youtube and Youku channel!!! # Youtube video tutorial: https://ww ...
莫烦python教程学习笔记——validation_curve用于调参
# View more python learning tutorial on my Youtube and Youku channel!!! # Youtube video tutorial: ht ...
莫烦python教程学习笔记——learn_curve曲线用于过拟合问题
# View more python learning tutorial on my Youtube and Youku channel!!! # Youtube video tutorial: ht ...
莫烦python教程学习笔记——利用交叉验证计算模型得分、选择模型参数
# View more python learning tutorial on my Youtube and Youku channel!!! # Youtube video tutorial: ht ...
莫烦python教程学习笔记——数据预处理之normalization
# View more python learning tutorial on my Youtube and Youku channel!!! # Youtube video tutorial: ht ...
莫烦python教程学习笔记——线性回归模型的属性
#调用查看线性回归的几个属性 # Youtube video tutorial: https://www.youtube.com/channel/UCdyjiB5H8Pu7aDTNVXTTpcg # ...
莫烦python教程学习笔记——使用波士顿数据集、生成用于回归的数据集
# View more python learning tutorial on my Youtube and Youku channel!!! # Youtube video tutorial: ht ...

随机推荐

Merging into a Table: Example
Merging into a Table: Example The following example uses the bonuses table in the sample schema oe w ...
洛谷 P3178 BZOJ 4034 [HAOI2015]树上操作
题目描述有一棵点数为 N 的树,以点 1 为根,且树点有边权.然后有 M 个操作,分为三种:操作 1 :把某个节点 x 的点权增加 a .操作 2 :把某个节点 x 为根的子树中所有点的点权都增加 ...
MySQL性能分析、及调优工具使用详解
本文汇总了MySQL DBA日常工作中用到的些工具,方便初学者,也便于自己查阅. 先介绍下基础设施(CPU.IO.网络等)检查的工具: vmstat.sar(sysstat工具包).mpstat.op ...
POJ 2607
一次FLOYD,再枚举. 注意题目要求的输出是什么哦. #include <iostream> #include <cstdio> #include <cstring&g ...
【微信小程序】：小程序，新场景
前言: 我们频繁进入的地方,是场景. 手机.是场景:浏览器.是场景.事实上,微信,也是场景-- 微信要做的是占领很多其它用户时间.占领很多其它应用场景.占领很多其它服务入口.这是商业本质想去垄断要做的 ...
/sys/power/state
kernel/power/main.c中: /** * state - control system power state. * * show() returns what states are s ...
android 点击返回键退出程序的方法
android 点击返回键退出程序的方法第一种: 再按一次返回键退出程序 private long exitTime = 0; @Override public boolean onKeyDown( ...
hdu1542 Atlantis（扫描线+线段树+离散）矩形相交面积
题目链接:点击打开链接题目描写叙述:给定一些矩形,求这些矩形的总面积.假设有重叠.仅仅算一次解题思路:扫描线+线段树+离散(代码从上往下扫描) 代码: #include<cstdio> ...
python清除数据库错误日志
# coding=gbk from encodings import gbk import re import sys import os import pyodbc import trac ...
【待解决】创建maven web工程报错
报错信息如下: Could not calculate build plan: Plugin org.apache.maven.plugins:maven-resources-plugin:2.6 o ...

莫烦大大TensorFlow学习笔记（8）----优化器