一.疑问

这几天一直纠结于一个问题:

同样的代码,为什么在keras的0.3.3版本中,拟合得比较好,也没有过拟合,验证集准确率一直高于训练准确率. 但是在换到keras的1.2.0版本中的时候,就过拟合了,验证误差一直高于训练误差

二.答案

今天终于发现原因了,原来是这两个版本的keras的optimezer实现不一样,但是它们的默认参数是一样的,因为我代码中用的是adam方法优化,下面就以optimezer中的adam来举例说明:

1.下面是keras==0.3.3时,其中optimezer.py中的adam方法实现:

 class Adam(Optimizer):
'''Adam optimizer. Default parameters follow those provided in the original paper. # Arguments
lr: float >= . Learning rate.
beta_1/beta_2: floats, < beta < . Generally close to .
epsilon: float >= . Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-,
*args, **kwargs):
super(Adam, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable()
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [(self.iterations, self.iterations+.)] t = self.iterations +
lr_t = self.lr * K.sqrt( - K.pow(self.beta_2, t)) / ( - K.pow(self.beta_1, t)) for p, g, c in zip(params, grads, constraints):
# zero init of moment
m = K.variable(np.zeros(K.get_value(p).shape))
# zero init of velocity
v = K.variable(np.zeros(K.get_value(p).shape)) m_t = (self.beta_1 * m) + ( - self.beta_1) * g
v_t = (self.beta_2 * v) + ( - self.beta_2) * K.square(g)
p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon) self.updates.append((m, m_t))
self.updates.append((v, v_t))
self.updates.append((p, c(p_t))) # apply constraints
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"beta_1": float(K.get_value(self.beta_1)),
"beta_2": float(K.get_value(self.beta_2)),
"epsilon": self.epsilon}

2.下面是keras==1.2.0时,其中optimezer.py中的adam方法实现:

 class Adam(Optimizer):
'''Adam optimizer. Default parameters follow those provided in the original paper. # Arguments
lr: float >= . Learning rate.
beta_1/beta_2: floats, < beta < . Generally close to .
epsilon: float >= . Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999,
epsilon=1e-, decay=., **kwargs):
super(Adam, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable()
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2)
self.decay = K.variable(decay)
self.inital_decay = decay def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, )] lr = self.lr
if self.inital_decay > :
lr *= (. / (. + self.decay * self.iterations)) t = self.iterations +
lr_t = lr * K.sqrt(. - K.pow(self.beta_2, t)) / (. - K.pow(self.beta_1, t)) shapes = [K.get_variable_shape(p) for p in params]
ms = [K.zeros(shape) for shape in shapes]
vs = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + ms + vs for p, g, m, v in zip(params, grads, ms, vs):
m_t = (self.beta_1 * m) + (. - self.beta_1) * g
v_t = (self.beta_2 * v) + (. - self.beta_2) * K.square(g)
p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon) self.updates.append(K.update(m, m_t))
self.updates.append(K.update(v, v_t)) new_p = p_t
# apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
base_config = super(Adam, self).get_config()
return dict(list(base_config.items()) + list(config.items()))

读代码对比,可发现这两者实现方式有不同,而我的代码中一直使用的是adam的默认参数,所以才会结果不一样.

三.解决

要避免这一问题可用以下方法:

1.在自己的代码中,要对优化器的参数给定,不要用默认参数.

adam = optimizers.Adam(lr=1e-)

但是,在keras官方文档中,明确有说明,在用这些优化器的时候,最好使用默认参数,所以也可采用第2种方法.

2.优化函数中的优化方法要给定,也就是在训练的时候,在fit函数中的callbacks参数中的schedule要给定.

比如:

 # Callback that implements learning rate schedule
schedule = Step([], [1e-, 1e-]) history = model.fit(X_train, Y_train,
batch_size=batch_size, nb_epoch=nb_epoch, validation_data=(X_test,Y_test),
callbacks=[
schedule,
keras.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=,save_best_only=True, mode='auto')# 该回调函数将在每个epoch后保存模型到filepath
# ,keras.callbacks.EarlyStopping(monitor='val_loss', patience=, verbose=, mode='auto')# 当监测值不再改善时,该回调函数将中止训练.当early stop被激活(如发现loss相比上一个epoch训练没有下降),则经过patience个epoch后停止训练
],
verbose=, shuffle=True)

其中Step函数如下:

 class Step(Callback):

     def __init__(self, steps, learning_rates, verbose=):
self.steps = steps
self.lr = learning_rates
self.verbose = verbose def change_lr(self, new_lr):
old_lr = K.get_value(self.model.optimizer.lr)
K.set_value(self.model.optimizer.lr, new_lr)
if self.verbose == :
print('Learning rate is %g' %new_lr) def on_epoch_begin(self, epoch, logs={}):
for i, step in enumerate(self.steps):
if epoch < step:
self.change_lr(self.lr[i])
return
self.change_lr(self.lr[i+]) def get_config(self):
config = {'class': type(self).__name__,
'steps': self.steps,
'learning_rates': self.lr,
'verbose': self.verbose}
return config @classmethod
def from_config(cls, config):
offset = config.get('epoch_offset', )
steps = [step - offset for step in config['steps']]
return cls(steps, config['learning_rates'],
verbose=config.get('verbose', ))

Deep Learning 31: 不同版本的keras,对同样的代码,得到不同结果的原因总结的更多相关文章

  1. Deep Learning 32: 自己写的keras的一个callbacks函数,解决keras中不能在每个epoch实时显示学习速率learning rate的问题

    一.问题: keras中不能在每个epoch实时显示学习速率learning rate,从而方便调试,实际上也是为了调试解决这个问题:Deep Learning 31: 不同版本的keras,对同样的 ...

  2. How to Grid Search Hyperparameters for Deep Learning Models in Python With Keras

    Hyperparameter optimization is a big part of deep learning. The reason is that neural networks are n ...

  3. Top Deep Learning Projects in github

    Top Deep Learning Projects A list of popular github projects related to deep learning (ranked by sta ...

  4. Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现(转)

    Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现 zouxy09@qq.com http://blog.csdn.net/zouxy09          自己平时看了一些论文, ...

  5. Coursera Deep Learning 2 Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization - week1, Assignment(Regularization)

    声明:所有内容来自coursera,作为个人学习笔记记录在这里. Regularization Welcome to the second assignment of this week. Deep ...

  6. Deep Learning论文笔记之(四)CNN卷积神经网络推导和实现

    https://blog.csdn.net/zouxy09/article/details/9993371 自己平时看了一些论文,但老感觉看完过后就会慢慢的淡忘,某一天重新拾起来的时候又好像没有看过一 ...

  7. How To Improve Deep Learning Performance

    如何提高深度学习性能 20 Tips, Tricks and Techniques That You Can Use ToFight Overfitting and Get Better Genera ...

  8. Unsupervised Feature Learning and Deep Learning(UFLDL) Exercise 总结

    7.27 暑假开始后,稍有时间,“搞完”金融项目,便开始跑跑 Deep Learning的程序 Hinton 在Nature上文章的代码 跑了3天 也没跑完 后来Debug 把batch 从200改到 ...

  9. (转) 基于Theano的深度学习(Deep Learning)框架Keras学习随笔-01-FAQ

    特别棒的一篇文章,仍不住转一下,留着以后需要时阅读 基于Theano的深度学习(Deep Learning)框架Keras学习随笔-01-FAQ

随机推荐

  1. FreeMarker数据模板引擎全面教程mark

    http://blog.csdn.net/fhx007/article/details/7902040/#comments 以下内容全部是网上收集: FreeMarker的模板文件并不比HTML页面复 ...

  2. Nk 1430 Fibonacci(二分矩阵乘幂)

    AC代码: #include<iostream> using namespace std; ][]; ][]; ][]; ][]; void binary(int n) { int i,j ...

  3. vs2005做的留言本——天轰川下载

    原文发布时间为:2008-08-01 -- 来源于本人的百度文章 [由搬家工具导入] 这个虽然单纯是个留言本,但是在功能上我都使用了尽量不重复的解决方法,所以我自认为非常适合入门级的朋友看,而且用了我 ...

  4. java私有构造函数

    1. 强调类的单例模式 public class Elvs { //公有的静态域,来说明该类只能有一个实例(实例化一次后,后面都是同一个实例) public static final Elvs INS ...

  5. SPOJ 26108 TRENDGCD - Trending GCD

    Discription Problem statement is simple. Given A and B you need to calculate S(A,B) . Here, f(n)=n, ...

  6. 2716 [Violet 3] 天使玩偶

    @(BZOJ)[CDQ分治] Sample Input 100 100 81 23 27 16 52 58 44 24 25 95 34 2 96 25 8 14 97 50 97 18 64 3 4 ...

  7. cut printf awk sed grep笔记

    名称 作用 参数 实例 cut 截取某列,可指定分隔 -f 列号 -d 分隔符 cut -d ":" -f 1, 3 /etc/passwd 截取第一列和第三列 printf pr ...

  8. PYTHON 源码

    http://www.wklken.me/index2.html http://blog.csdn.net/dbzhang800/article/details/6683440

  9. VS2017不能生成Database Unit Test项目

    问题描述: VS2017生成Database Unit Test项目时,报出如下错误,但该项目在VS2015中能正常生成: 主要是因为下面两个程序集找不到引用: Microsoft.Data.Tool ...

  10. [vxlan] 二 什么是VXLAN

    VXLAN是一种mac in UDP的技术.简单讲就是传统的二层帧被封装到了UDP的package中.通过UDP的IP网络发送到目的地然后再解封装. VXLAN 跟VLAN对比,最重要的一个概念就是V ...