Deep Learning 31: 不同版本的keras,对同样的代码,得到不同结果的原因总结
一.疑问
这几天一直纠结于一个问题:
同样的代码,为什么在keras的0.3.3版本中,拟合得比较好,也没有过拟合,验证集准确率一直高于训练准确率. 但是在换到keras的1.2.0版本中的时候,就过拟合了,验证误差一直高于训练误差
二.答案
今天终于发现原因了,原来是这两个版本的keras的optimezer实现不一样,但是它们的默认参数是一样的,因为我代码中用的是adam方法优化,下面就以optimezer中的adam来举例说明:
1.下面是keras==0.3.3时,其中optimezer.py中的adam方法实现:
class Adam(Optimizer):
'''Adam optimizer. Default parameters follow those provided in the original paper. # Arguments
lr: float >= . Learning rate.
beta_1/beta_2: floats, < beta < . Generally close to .
epsilon: float >= . Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-,
*args, **kwargs):
super(Adam, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable()
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [(self.iterations, self.iterations+.)] t = self.iterations +
lr_t = self.lr * K.sqrt( - K.pow(self.beta_2, t)) / ( - K.pow(self.beta_1, t)) for p, g, c in zip(params, grads, constraints):
# zero init of moment
m = K.variable(np.zeros(K.get_value(p).shape))
# zero init of velocity
v = K.variable(np.zeros(K.get_value(p).shape)) m_t = (self.beta_1 * m) + ( - self.beta_1) * g
v_t = (self.beta_2 * v) + ( - self.beta_2) * K.square(g)
p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon) self.updates.append((m, m_t))
self.updates.append((v, v_t))
self.updates.append((p, c(p_t))) # apply constraints
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"beta_1": float(K.get_value(self.beta_1)),
"beta_2": float(K.get_value(self.beta_2)),
"epsilon": self.epsilon}
2.下面是keras==1.2.0时,其中optimezer.py中的adam方法实现:
class Adam(Optimizer):
'''Adam optimizer. Default parameters follow those provided in the original paper. # Arguments
lr: float >= . Learning rate.
beta_1/beta_2: floats, < beta < . Generally close to .
epsilon: float >= . Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999,
epsilon=1e-, decay=., **kwargs):
super(Adam, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable()
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2)
self.decay = K.variable(decay)
self.inital_decay = decay def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, )] lr = self.lr
if self.inital_decay > :
lr *= (. / (. + self.decay * self.iterations)) t = self.iterations +
lr_t = lr * K.sqrt(. - K.pow(self.beta_2, t)) / (. - K.pow(self.beta_1, t)) shapes = [K.get_variable_shape(p) for p in params]
ms = [K.zeros(shape) for shape in shapes]
vs = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + ms + vs for p, g, m, v in zip(params, grads, ms, vs):
m_t = (self.beta_1 * m) + (. - self.beta_1) * g
v_t = (self.beta_2 * v) + (. - self.beta_2) * K.square(g)
p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon) self.updates.append(K.update(m, m_t))
self.updates.append(K.update(v, v_t)) new_p = p_t
# apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
base_config = super(Adam, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
读代码对比,可发现这两者实现方式有不同,而我的代码中一直使用的是adam的默认参数,所以才会结果不一样.
三.解决
要避免这一问题可用以下方法:
1.在自己的代码中,要对优化器的参数给定,不要用默认参数.
adam = optimizers.Adam(lr=1e-)
但是,在keras官方文档中,明确有说明,在用这些优化器的时候,最好使用默认参数,所以也可采用第2种方法.
2.优化函数中的优化方法要给定,也就是在训练的时候,在fit函数中的callbacks参数中的schedule要给定.
比如:
# Callback that implements learning rate schedule
schedule = Step([], [1e-, 1e-]) history = model.fit(X_train, Y_train,
batch_size=batch_size, nb_epoch=nb_epoch, validation_data=(X_test,Y_test),
callbacks=[
schedule,
keras.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=,save_best_only=True, mode='auto')# 该回调函数将在每个epoch后保存模型到filepath
# ,keras.callbacks.EarlyStopping(monitor='val_loss', patience=, verbose=, mode='auto')# 当监测值不再改善时,该回调函数将中止训练.当early stop被激活(如发现loss相比上一个epoch训练没有下降),则经过patience个epoch后停止训练
],
verbose=, shuffle=True)
其中Step函数如下:
class Step(Callback):
def __init__(self, steps, learning_rates, verbose=):
self.steps = steps
self.lr = learning_rates
self.verbose = verbose
def change_lr(self, new_lr):
old_lr = K.get_value(self.model.optimizer.lr)
K.set_value(self.model.optimizer.lr, new_lr)
if self.verbose == :
print('Learning rate is %g' %new_lr)
def on_epoch_begin(self, epoch, logs={}):
for i, step in enumerate(self.steps):
if epoch < step:
self.change_lr(self.lr[i])
return
self.change_lr(self.lr[i+])
def get_config(self):
config = {'class': type(self).__name__,
'steps': self.steps,
'learning_rates': self.lr,
'verbose': self.verbose}
return config
@classmethod
def from_config(cls, config):
offset = config.get('epoch_offset', )
steps = [step - offset for step in config['steps']]
return cls(steps, config['learning_rates'],
verbose=config.get('verbose', ))
Deep Learning 31: 不同版本的keras,对同样的代码,得到不同结果的原因总结的更多相关文章
- Deep Learning 32: 自己写的keras的一个callbacks函数,解决keras中不能在每个epoch实时显示学习速率learning rate的问题
一.问题: keras中不能在每个epoch实时显示学习速率learning rate,从而方便调试,实际上也是为了调试解决这个问题:Deep Learning 31: 不同版本的keras,对同样的 ...
- How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras
Hyperparameter optimization is a big part of deep learning. The reason is that neural networks are n ...
- Top Deep Learning Projects in github
Top Deep Learning Projects A list of popular github projects related to deep learning (ranked by sta ...
- Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现(转)
Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现 zouxy09@qq.com http://blog.csdn.net/zouxy09 自己平时看了一些论文, ...
- Coursera Deep Learning 2 Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization - week1, Assignment(Regularization)
声明:所有内容来自coursera,作为个人学习笔记记录在这里. Regularization Welcome to the second assignment of this week. Deep ...
- Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现
https://blog.csdn.net/zouxy09/article/details/9993371 自己平时看了一些论文,但老感觉看完过后就会慢慢的淡忘,某一天重新拾起来的时候又好像没有看过一 ...
- How To Improve Deep Learning Performance
如何提高深度学习性能 20 Tips, Tricks and Techniques That You Can Use ToFight Overfitting and Get Better Genera ...
- Unsupervised Feature Learning and Deep Learning(UFLDL) Exercise 总结
7.27 暑假开始后,稍有时间,“搞完”金融项目,便开始跑跑 Deep Learning的程序 Hinton 在Nature上文章的代码 跑了3天 也没跑完 后来Debug 把batch 从200改到 ...
- (转) 基于Theano的深度学习(Deep Learning)框架Keras学习随笔-01-FAQ
特别棒的一篇文章,仍不住转一下,留着以后需要时阅读 基于Theano的深度学习(Deep Learning)框架Keras学习随笔-01-FAQ
随机推荐
- 解决 sqlalchemy 报错:(1193, "Unknown system variable 'tx_isolation'")
1出现此报错的原因是使用的mysql8.0 以前用的是:tx_isolation 现在用是: transaction_isolation a.通过升级 sqlalchemy 的方法可以解决此问题, p ...
- 【树状数组区间修改单点查询+分组】HDU 4267 A Simple Problem with Integers
http://acm.hdu.edu.cn/showproblem.php?pid=4267 [思路] 树状数组的区间修改:在区间[a, b]内更新+x就在a的位置+x. 然后在b+1的位置-x 树状 ...
- bzoj [Scoi2016]美味
[Scoi2016]美味 Time Limit: 30 Sec Memory Limit: 256 MBSubmit: 721 Solved: 391[Submit][Status][Discus ...
- oracle禁止插入、延迟插入方法
DATE_ADD(DATE_ADD(curdate(),INTERVAL +6 HOUR),INTERVAL +6 DAY) mysql取当前日期后6天,截止到6点钟的方法 --直接报错 CREATE ...
- LA 2797 平面区域dfs
题目大意:一个平面区域有n条线段,问能否从(0,0)处到达无穷远处(不穿过任何线段) 分析:若两条线段有一个端点重合,这种情况是不能从端点重合处穿过的 的.因此对每个端点延长一点,就可以避免这个问题. ...
- polya burnside 专题
polya题目:uva 11077 Find the Permutationsuva 10294 Arif in DhakaLA 3641 Leonardo's Notebookuva 11077 F ...
- UVa10214 Trees in a Wood.
先算第一象限能看到的树,答案乘以4就是四个象限的数的总数,再加上坐标轴上四棵树,就是总共能看到的树. 树的总数为(2*a+1)*(2*b+1)-1 ←矩形面积除去原点位置 设一棵树的坐标是(x,y) ...
- laravel的视图
//输出视图 //建立控制器方法public function hello_test(){ return view('member/hello_test',['name'=>'张三','age' ...
- 由String作为方法参数,引起的值传递,引用传递,及StringBuffer
原文引用: http://www.cnblogs.com/zuoxiaolong/p/lang1.html http://www.cnblogs.com/clara/archive/2011/09/1 ...
- IntelliJ IDEA插件-常用插件
IntelliJ IDEA的插件真的很多,最近的新版集成的插件已经基本够用,下面是收集的一些常用插件,根据需要来安装和测试.如果还是没有找到,那么自己来开发一个. 官网:https://plugins ...