Deep Learning 32: 自己写的keras的一个callbacks函数,解决keras中不能在每个epoch实时显示学习速率learning rate的问题
一.问题:
keras中不能在每个epoch实时显示学习速率learning rate,从而方便调试,实际上也是为了调试解决这个问题:Deep Learning 31: 不同版本的keras,对同样的代码,得到不同结果的原因总结
二.解决方法
1.把下面代码加入keras文件callbacks.py中:
class DisplayLearningRate(Callback):
'''Display Learning rate .
'''
def __init__(self):
super(DisplayLearningRate, self).__init__() def on_epoch_begin(self, epoch, logs={}):
assert hasattr(self.model.optimizer, 'lr'), \
'Optimizer must have a "lr" attribute.'
lr_now = K.get_value(self.model.optimizer.lr) print('Epoch %05d: Learning rate is %s' % (epoch, lr_now))
2.应用方法如下:
history = model.fit(X_train,
Y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
show_accuracy=False,
verbose=,
validation_data=(X_test, Y_test),
callbacks = [
keras.callbacks.DisplayLearningRate(),
keras.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=, save_best_only=True, mode='auto'), # 该回调函数将在每个epoch后保存模型到filepath
# keras.callbacks.EarlyStopping(monitor='val_loss', patience=, verbose=, mode='auto')# 当监测值不再改善时,该回调函数将中止训练.当early stop被激活(如发现loss相比上一个epoch训练没有下降),则经过patience个epoch后停止训练
])
三.总结
按照上面的方法试了之后发现,每个epoch显示的learning rate都是一样的,原来按照这样显示的是最开始初始化时的learning rate,每次epoch学习速率更新后,并没有把值赋给初始时的learning rate,所以才会这样,那么要怎么样才能实时显示每个epoch的学习速率呢? 我觉得应该是显示optimizer中的updates.
四.最终办法
# set the decay as 1e-1 to see the Ir change between epochs.
sgd = SGD(lr=0.1, decay=1e-1, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])
class LossHistory(Callback):
def on_epoch_begin(self, batch, logs={}):
lr = self.lr * (1. / (1. + self.decay * self.iterations))
print('Ir:', lr)
history=LossHistory()
model.fit(X_train, Y_train,
batch_size= batch_size,
nb_epoch= nb_epoch,
callbacks= [history])
参考:http://stackoverflow.com/questions/40144805/print-learning-rate-evary-epoch-in-sgd
下面我分别把keras==0.3.3和1.2.0时的optimizer.py分别贴出来:
keras==0.3.3时的optimizer.py如下:
from __future__ import absolute_import
from . import backend as K
import numpy as np
from .utils.generic_utils import get_from_module
from six.moves import zip def clip_norm(g, c, n):
if c > 0:
g = K.switch(n >= c, g * c / n, g)
return g def kl_divergence(p, p_hat):
return p_hat - p + p * K.log(p / p_hat) class Optimizer(object):
'''Abstract optimizer base class. Note: this is the parent class of all optimizers, not an actual optimizer
that can be used for training models. All Keras optimizers support the following keyword arguments: clipnorm: float >= 0. Gradients will be clipped
when their L2 norm exceeds this value.
clipvalue: float >= 0. Gradients will be clipped
when their absolute value exceeds this value.
'''
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
self.updates = [] def get_state(self):
return [K.get_value(u[0]) for u in self.updates] def set_state(self, value_list):
assert len(self.updates) == len(value_list)
for u, v in zip(self.updates, value_list):
K.set_value(u[0], v) def get_updates(self, params, constraints, loss):
raise NotImplementedError def get_gradients(self, loss, params):
grads = K.gradients(loss, params)
if hasattr(self, 'clipnorm') and self.clipnorm > 0:
norm = K.sqrt(sum([K.sum(K.square(g)) for g in grads]))
grads = [clip_norm(g, self.clipnorm, norm) for g in grads]
if hasattr(self, 'clipvalue') and self.clipvalue > 0:
grads = [K.clip(g, -self.clipvalue, self.clipvalue) for g in grads]
return grads def get_config(self):
return {"name": self.__class__.__name__} class SGD(Optimizer):
'''Stochastic gradient descent, with support for momentum,
decay, and Nesterov momentum. # Arguments
lr: float >= 0. Learning rate.
momentum: float >= 0. Parameter updates momentum.
decay: float >= 0. Learning rate decay over each update.
nesterov: boolean. Whether to apply Nesterov momentum.
'''
def __init__(self, lr=0.01, momentum=0., decay=0., nesterov=False,
*args, **kwargs):
super(SGD, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0.)
self.lr = K.variable(lr)
self.momentum = K.variable(momentum)
self.decay = K.variable(decay) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
lr = self.lr * (1.0 / (1.0 + self.decay * self.iterations))
self.updates = [(self.iterations, self.iterations + 1.)] for p, g, c in zip(params, grads, constraints):
m = K.variable(np.zeros(K.get_value(p).shape)) # momentum
v = self.momentum * m - lr * g # velocity
self.updates.append((m, v)) if self.nesterov:
new_p = p + self.momentum * v - lr * g
else:
new_p = p + v self.updates.append((p, c(new_p))) # apply constraints
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"momentum": float(K.get_value(self.momentum)),
"decay": float(K.get_value(self.decay)),
"nesterov": self.nesterov} class RMSprop(Optimizer):
'''RMSProp optimizer. It is recommended to leave the parameters of this optimizer
at their default values. This optimizer is usually a good choice for recurrent
neural networks. # Arguments
lr: float >= 0. Learning rate.
rho: float >= 0.
epsilon: float >= 0. Fuzz factor.
'''
def __init__(self, lr=0.001, rho=0.9, epsilon=1e-6, *args, **kwargs):
super(RMSprop, self).__init__(**kwargs)
self.__dict__.update(locals())
self.lr = K.variable(lr)
self.rho = K.variable(rho) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
accumulators = [K.variable(np.zeros(K.get_value(p).shape)) for p in params]
self.updates = [] for p, g, a, c in zip(params, grads, accumulators, constraints):
# update accumulator
new_a = self.rho * a + (1 - self.rho) * K.square(g)
self.updates.append((a, new_a)) new_p = p - self.lr * g / K.sqrt(new_a + self.epsilon)
self.updates.append((p, c(new_p))) # apply constraints
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"rho": float(K.get_value(self.rho)),
"epsilon": self.epsilon} class Adagrad(Optimizer):
'''Adagrad optimizer. It is recommended to leave the parameters of this optimizer
at their default values. # Arguments
lr: float >= 0. Learning rate.
epsilon: float >= 0.
'''
def __init__(self, lr=0.01, epsilon=1e-6, *args, **kwargs):
super(Adagrad, self).__init__(**kwargs)
self.__dict__.update(locals())
self.lr = K.variable(lr) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
accumulators = [K.variable(np.zeros(K.get_value(p).shape)) for p in params]
self.updates = [] for p, g, a, c in zip(params, grads, accumulators, constraints):
new_a = a + K.square(g) # update accumulator
self.updates.append((a, new_a))
new_p = p - self.lr * g / K.sqrt(new_a + self.epsilon)
self.updates.append((p, c(new_p))) # apply constraints
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"epsilon": self.epsilon} class Adadelta(Optimizer):
'''Adadelta optimizer. It is recommended to leave the parameters of this optimizer
at their default values. # Arguments
lr: float >= 0. Learning rate. It is recommended to leave it at the default value.
rho: float >= 0.
epsilon: float >= 0. Fuzz factor. # References
- [Adadelta - an adaptive learning rate method](http://arxiv.org/abs/1212.5701)
'''
def __init__(self, lr=1.0, rho=0.95, epsilon=1e-6, *args, **kwargs):
super(Adadelta, self).__init__(**kwargs)
self.__dict__.update(locals())
self.lr = K.variable(lr) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
accumulators = [K.variable(np.zeros(K.get_value(p).shape)) for p in params]
delta_accumulators = [K.variable(np.zeros(K.get_value(p).shape)) for p in params]
self.updates = [] for p, g, a, d_a, c in zip(params, grads, accumulators,
delta_accumulators, constraints):
# update accumulator
new_a = self.rho * a + (1 - self.rho) * K.square(g)
self.updates.append((a, new_a)) # use the new accumulator and the *old* delta_accumulator
update = g * K.sqrt(d_a + self.epsilon) / K.sqrt(new_a + self.epsilon) new_p = p - self.lr * update
self.updates.append((p, c(new_p))) # apply constraints # update delta_accumulator
new_d_a = self.rho * d_a + (1 - self.rho) * K.square(update)
self.updates.append((d_a, new_d_a))
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"rho": self.rho,
"epsilon": self.epsilon} class Adam(Optimizer):
'''Adam optimizer. Default parameters follow those provided in the original paper. # Arguments
lr: float >= 0. Learning rate.
beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
epsilon: float >= 0. Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8,
*args, **kwargs):
super(Adam, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0)
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [(self.iterations, self.iterations+1.)] t = self.iterations + 1
lr_t = self.lr * K.sqrt(1 - K.pow(self.beta_2, t)) / (1 - K.pow(self.beta_1, t)) for p, g, c in zip(params, grads, constraints):
# zero init of moment
m = K.variable(np.zeros(K.get_value(p).shape))
# zero init of velocity
v = K.variable(np.zeros(K.get_value(p).shape)) m_t = (self.beta_1 * m) + (1 - self.beta_1) * g
v_t = (self.beta_2 * v) + (1 - self.beta_2) * K.square(g)
p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon) self.updates.append((m, m_t))
self.updates.append((v, v_t))
self.updates.append((p, c(p_t))) # apply constraints
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"beta_1": float(K.get_value(self.beta_1)),
"beta_2": float(K.get_value(self.beta_2)),
"epsilon": self.epsilon} class Adamax(Optimizer):
'''Adamax optimizer from Adam paper's Section 7. It is a variant
of Adam based on the infinity norm. Default parameters follow those provided in the paper. # Arguments
lr: float >= 0. Learning rate.
beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
epsilon: float >= 0. Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=1e-8,
*args, **kwargs):
super(Adamax, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0)
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [(self.iterations, self.iterations+1.)] t = self.iterations + 1
lr_t = self.lr / (1 - K.pow(self.beta_1, t)) for p, g, c in zip(params, grads, constraints):
# zero init of 1st moment
m = K.variable(np.zeros(K.get_value(p).shape))
# zero init of exponentially weighted infinity norm
u = K.variable(np.zeros(K.get_value(p).shape)) m_t = (self.beta_1 * m) + (1 - self.beta_1) * g
u_t = K.maximum(self.beta_2 * u, K.abs(g))
p_t = p - lr_t * m_t / (u_t + self.epsilon) self.updates.append((m, m_t))
self.updates.append((u, u_t))
self.updates.append((p, c(p_t))) # apply constraints
return self.updates def get_config(self):
return {"name": self.__class__.__name__,
"lr": float(K.get_value(self.lr)),
"beta_1": float(K.get_value(self.beta_1)),
"beta_2": float(K.get_value(self.beta_2)),
"epsilon": self.epsilon} # aliases
sgd = SGD
rmsprop = RMSprop
adagrad = Adagrad
adadelta = Adadelta
adam = Adam
adamax = Adamax def get(identifier, kwargs=None):
return get_from_module(identifier, globals(), 'optimizer',
instantiate=True, kwargs=kwargs)
keras==1.2.0时的optimizer.py如下:
from __future__ import absolute_import
from . import backend as K
from .utils.generic_utils import get_from_module
from six.moves import zip def clip_norm(g, c, n):
if c > 0:
g = K.switch(n >= c, g * c / n, g)
return g def optimizer_from_config(config, custom_objects={}):
all_classes = {
'sgd': SGD,
'rmsprop': RMSprop,
'adagrad': Adagrad,
'adadelta': Adadelta,
'adam': Adam,
'adamax': Adamax,
'nadam': Nadam,
'tfoptimizer': TFOptimizer,
}
class_name = config['class_name']
if class_name in custom_objects:
cls = custom_objects[class_name]
else:
if class_name.lower() not in all_classes:
raise ValueError('Optimizer class not found:', class_name)
cls = all_classes[class_name.lower()]
return cls.from_config(config['config']) class Optimizer(object):
'''Abstract optimizer base class. Note: this is the parent class of all optimizers, not an actual optimizer
that can be used for training models. All Keras optimizers support the following keyword arguments: clipnorm: float >= 0. Gradients will be clipped
when their L2 norm exceeds this value.
clipvalue: float >= 0. Gradients will be clipped
when their absolute value exceeds this value.
'''
def __init__(self, **kwargs):
allowed_kwargs = {'clipnorm', 'clipvalue'}
for k in kwargs:
if k not in allowed_kwargs:
raise TypeError('Unexpected keyword argument '
'passed to optimizer: ' + str(k))
self.__dict__.update(kwargs)
self.updates = []
self.weights = [] def get_updates(self, params, constraints, loss):
raise NotImplementedError def get_gradients(self, loss, params):
grads = K.gradients(loss, params)
if hasattr(self, 'clipnorm') and self.clipnorm > 0:
norm = K.sqrt(sum([K.sum(K.square(g)) for g in grads]))
grads = [clip_norm(g, self.clipnorm, norm) for g in grads]
if hasattr(self, 'clipvalue') and self.clipvalue > 0:
grads = [K.clip(g, -self.clipvalue, self.clipvalue) for g in grads]
return grads def set_weights(self, weights):
'''Sets the weights of the optimizer, from Numpy arrays. Should only be called after computing the gradients
(otherwise the optimizer has no weights). # Arguments
weights: a list of Numpy arrays. The number
of arrays and their shape must match
number of the dimensions of the weights
of the optimizer (i.e. it should match the
output of `get_weights`).
'''
params = self.weights
weight_value_tuples = []
param_values = K.batch_get_value(params)
for pv, p, w in zip(param_values, params, weights):
if pv.shape != w.shape:
raise ValueError('Optimizer weight shape ' +
str(pv.shape) +
' not compatible with '
'provided weight shape ' + str(w.shape))
weight_value_tuples.append((p, w))
K.batch_set_value(weight_value_tuples) def get_weights(self):
'''Returns the current weights of the optimizer,
as a list of numpy arrays.
'''
return K.batch_get_value(self.weights) def get_config(self):
config = {}
if hasattr(self, 'clipnorm'):
config['clipnorm'] = self.clipnorm
if hasattr(self, 'clipvalue'):
config['clipvalue'] = self.clipvalue
return config @classmethod
def from_config(cls, config):
return cls(**config) class SGD(Optimizer):
'''Stochastic gradient descent, with support for momentum,
learning rate decay, and Nesterov momentum. # Arguments
lr: float >= 0. Learning rate.
momentum: float >= 0. Parameter updates momentum.
decay: float >= 0. Learning rate decay over each update.
nesterov: boolean. Whether to apply Nesterov momentum.
'''
def __init__(self, lr=0.01, momentum=0., decay=0.,
nesterov=False, **kwargs):
super(SGD, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0.)
self.lr = K.variable(lr)
self.momentum = K.variable(momentum)
self.decay = K.variable(decay)
self.inital_decay = decay def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [] lr = self.lr
if self.inital_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations))
self.updates .append(K.update_add(self.iterations, 1)) # momentum
shapes = [K.get_variable_shape(p) for p in params]
moments = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g # velocity
self.updates.append(K.update(m, v)) if self.nesterov:
new_p = p + self.momentum * v - lr * g
else:
new_p = p + v # apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p) self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'momentum': float(K.get_value(self.momentum)),
'decay': float(K.get_value(self.decay)),
'nesterov': self.nesterov}
base_config = super(SGD, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class RMSprop(Optimizer):
'''RMSProp optimizer. It is recommended to leave the parameters of this optimizer
at their default values
(except the learning rate, which can be freely tuned). This optimizer is usually a good choice for recurrent
neural networks. # Arguments
lr: float >= 0. Learning rate.
rho: float >= 0.
epsilon: float >= 0. Fuzz factor.
decay: float >= 0. Learning rate decay over each update.
'''
def __init__(self, lr=0.001, rho=0.9, epsilon=1e-8, decay=0.,
**kwargs):
super(RMSprop, self).__init__(**kwargs)
self.__dict__.update(locals())
self.lr = K.variable(lr)
self.rho = K.variable(rho)
self.decay = K.variable(decay)
self.inital_decay = decay
self.iterations = K.variable(0.) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
shapes = [K.get_variable_shape(p) for p in params]
accumulators = [K.zeros(shape) for shape in shapes]
self.weights = accumulators
self.updates = [] lr = self.lr
if self.inital_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations))
self.updates.append(K.update_add(self.iterations, 1)) for p, g, a in zip(params, grads, accumulators):
# update accumulator
new_a = self.rho * a + (1. - self.rho) * K.square(g)
self.updates.append(K.update(a, new_a))
new_p = p - lr * g / (K.sqrt(new_a) + self.epsilon) # apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'rho': float(K.get_value(self.rho)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
base_config = super(RMSprop, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class Adagrad(Optimizer):
'''Adagrad optimizer. It is recommended to leave the parameters of this optimizer
at their default values. # Arguments
lr: float >= 0. Learning rate.
epsilon: float >= 0. # References
- [Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
'''
def __init__(self, lr=0.01, epsilon=1e-8, decay=0., **kwargs):
super(Adagrad, self).__init__(**kwargs)
self.__dict__.update(locals())
self.lr = K.variable(lr)
self.decay = K.variable(decay)
self.inital_decay = decay
self.iterations = K.variable(0.) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
shapes = [K.get_variable_shape(p) for p in params]
accumulators = [K.zeros(shape) for shape in shapes]
self.weights = accumulators
self.updates = [] lr = self.lr
if self.inital_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations))
self.updates.append(K.update_add(self.iterations, 1)) for p, g, a in zip(params, grads, accumulators):
new_a = a + K.square(g) # update accumulator
self.updates.append(K.update(a, new_a))
new_p = p - lr * g / (K.sqrt(new_a) + self.epsilon)
# apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
base_config = super(Adagrad, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class Adadelta(Optimizer):
'''Adadelta optimizer. It is recommended to leave the parameters of this optimizer
at their default values. # Arguments
lr: float >= 0. Learning rate.
It is recommended to leave it at the default value.
rho: float >= 0.
epsilon: float >= 0. Fuzz factor. # References
- [Adadelta - an adaptive learning rate method](http://arxiv.org/abs/1212.5701)
'''
def __init__(self, lr=1.0, rho=0.95, epsilon=1e-8, decay=0.,
**kwargs):
super(Adadelta, self).__init__(**kwargs)
self.__dict__.update(locals())
self.lr = K.variable(lr)
self.decay = K.variable(decay)
self.inital_decay = decay
self.iterations = K.variable(0.) def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
shapes = [K.get_variable_shape(p) for p in params]
accumulators = [K.zeros(shape) for shape in shapes]
delta_accumulators = [K.zeros(shape) for shape in shapes]
self.weights = accumulators + delta_accumulators
self.updates = [] lr = self.lr
if self.inital_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations))
self.updates.append(K.update_add(self.iterations, 1)) for p, g, a, d_a in zip(params, grads, accumulators, delta_accumulators):
# update accumulator
new_a = self.rho * a + (1. - self.rho) * K.square(g)
self.updates.append(K.update(a, new_a)) # use the new accumulator and the *old* delta_accumulator
update = g * K.sqrt(d_a + self.epsilon) / K.sqrt(new_a + self.epsilon) new_p = p - lr * update
# apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p)) # update delta_accumulator
new_d_a = self.rho * d_a + (1 - self.rho) * K.square(update)
self.updates.append(K.update(d_a, new_d_a))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'rho': self.rho,
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
base_config = super(Adadelta, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class Adam(Optimizer):
'''Adam optimizer. Default parameters follow those provided in the original paper. # Arguments
lr: float >= 0. Learning rate.
beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
epsilon: float >= 0. Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.001, beta_1=0.9, beta_2=0.999,
epsilon=1e-8, decay=0., **kwargs):
super(Adam, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0)
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2)
self.decay = K.variable(decay)
self.inital_decay = decay def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, 1)] lr = self.lr
if self.inital_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations)) t = self.iterations + 1
lr_t = lr * K.sqrt(1. - K.pow(self.beta_2, t)) / (1. - K.pow(self.beta_1, t)) shapes = [K.get_variable_shape(p) for p in params]
ms = [K.zeros(shape) for shape in shapes]
vs = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + ms + vs for p, g, m, v in zip(params, grads, ms, vs):
m_t = (self.beta_1 * m) + (1. - self.beta_1) * g
v_t = (self.beta_2 * v) + (1. - self.beta_2) * K.square(g)
p_t = p - lr_t * m_t / (K.sqrt(v_t) + self.epsilon) self.updates.append(K.update(m, m_t))
self.updates.append(K.update(v, v_t)) new_p = p_t
# apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
base_config = super(Adam, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class Adamax(Optimizer):
'''Adamax optimizer from Adam paper's Section 7. It is a variant
of Adam based on the infinity norm. Default parameters follow those provided in the paper. # Arguments
lr: float >= 0. Learning rate.
beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
epsilon: float >= 0. Fuzz factor. # References
- [Adam - A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980v8)
'''
def __init__(self, lr=0.002, beta_1=0.9, beta_2=0.999,
epsilon=1e-8, decay=0., **kwargs):
super(Adamax, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0.)
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2)
self.decay = K.variable(decay)
self.inital_decay = decay def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, 1)] lr = self.lr
if self.inital_decay > 0:
lr *= (1. / (1. + self.decay * self.iterations)) t = self.iterations + 1
lr_t = lr / (1. - K.pow(self.beta_1, t)) shapes = [K.get_variable_shape(p) for p in params]
# zero init of 1st moment
ms = [K.zeros(shape) for shape in shapes]
# zero init of exponentially weighted infinity norm
us = [K.zeros(shape) for shape in shapes]
self.weights = [self.iterations] + ms + us for p, g, m, u in zip(params, grads, ms, us): m_t = (self.beta_1 * m) + (1. - self.beta_1) * g
u_t = K.maximum(self.beta_2 * u, K.abs(g))
p_t = p - lr_t * m_t / (u_t + self.epsilon) self.updates.append(K.update(m, m_t))
self.updates.append(K.update(u, u_t)) new_p = p_t
# apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'decay': float(K.get_value(self.decay)),
'epsilon': self.epsilon}
base_config = super(Adamax, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class Nadam(Optimizer):
'''
Nesterov Adam optimizer: Much like Adam is essentially RMSprop with momentum,
Nadam is Adam RMSprop with Nesterov momentum. Default parameters follow those provided in the paper.
It is recommended to leave the parameters of this optimizer
at their default values. # Arguments
lr: float >= 0. Learning rate.
beta_1/beta_2: floats, 0 < beta < 1. Generally close to 1.
epsilon: float >= 0. Fuzz factor. # References
- [Nadam report](http://cs229.stanford.edu/proj2015/054_report.pdf)
- [On the importance of initialization and momentum in deep learning](http://www.cs.toronto.edu/~fritz/absps/momentum.pdf)
'''
def __init__(self, lr=0.002, beta_1=0.9, beta_2=0.999,
epsilon=1e-8, schedule_decay=0.004, **kwargs):
super(Nadam, self).__init__(**kwargs)
self.__dict__.update(locals())
self.iterations = K.variable(0.)
self.m_schedule = K.variable(1.)
self.lr = K.variable(lr)
self.beta_1 = K.variable(beta_1)
self.beta_2 = K.variable(beta_2)
self.schedule_decay = schedule_decay def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, 1)] t = self.iterations + 1 # Due to the recommendations in [2], i.e. warming momentum schedule
momentum_cache_t = self.beta_1 * (1. - 0.5 * (K.pow(0.96, t * self.schedule_decay)))
momentum_cache_t_1 = self.beta_1 * (1. - 0.5 * (K.pow(0.96, (t + 1) * self.schedule_decay)))
m_schedule_new = self.m_schedule * momentum_cache_t
m_schedule_next = self.m_schedule * momentum_cache_t * momentum_cache_t_1
self.updates.append((self.m_schedule, m_schedule_new)) shapes = [K.get_variable_shape(p) for p in params]
ms = [K.zeros(shape) for shape in shapes]
vs = [K.zeros(shape) for shape in shapes] self.weights = [self.iterations] + ms + vs for p, g, m, v in zip(params, grads, ms, vs):
# the following equations given in [1]
g_prime = g / (1. - m_schedule_new)
m_t = self.beta_1 * m + (1. - self.beta_1) * g
m_t_prime = m_t / (1. - m_schedule_next)
v_t = self.beta_2 * v + (1. - self.beta_2) * K.square(g)
v_t_prime = v_t / (1. - K.pow(self.beta_2, t))
m_t_bar = (1. - momentum_cache_t) * g_prime + momentum_cache_t_1 * m_t_prime self.updates.append(K.update(m, m_t))
self.updates.append(K.update(v, v_t)) p_t = p - self.lr * m_t_bar / (K.sqrt(v_t_prime) + self.epsilon)
new_p = p_t # apply constraints
if p in constraints:
c = constraints[p]
new_p = c(new_p)
self.updates.append(K.update(p, new_p))
return self.updates def get_config(self):
config = {'lr': float(K.get_value(self.lr)),
'beta_1': float(K.get_value(self.beta_1)),
'beta_2': float(K.get_value(self.beta_2)),
'epsilon': self.epsilon,
'schedule_decay': self.schedule_decay}
base_config = super(Nadam, self).get_config()
return dict(list(base_config.items()) + list(config.items())) class TFOptimizer(Optimizer): def __init__(self, optimizer):
self.optimizer = optimizer
self.iterations = K.variable(0.)
self.updates = [] def get_updates(self, params, constraints, loss):
if constraints:
raise ValueError('TF optimizers do not support '
'weights constraints. Either remove '
'all weights constraints in your model, '
'or use a Keras optimizer.')
grads = self.optimizer.compute_gradients(loss, params)
opt_update = self.optimizer.apply_gradients(
grads, global_step=self.iterations)
self.updates.append(opt_update)
return self.updates @property
def weights(self):
raise NotImplementedError def get_config(self):
raise NotImplementedError def from_config(self, config):
raise NotImplementedError # aliases
sgd = SGD
rmsprop = RMSprop
adagrad = Adagrad
adadelta = Adadelta
adam = Adam
adamax = Adamax
nadam = Nadam def get(identifier, kwargs=None):
if K.backend() == 'tensorflow':
# Wrap TF optimizer instances
import tensorflow as tf
if isinstance(identifier, tf.train.Optimizer):
return TFOptimizer(identifier)
# Instantiate a Keras optimizer
return get_from_module(identifier, globals(), 'optimizer',
instantiate=True, kwargs=kwargs)
Deep Learning 32: 自己写的keras的一个callbacks函数,解决keras中不能在每个epoch实时显示学习速率learning rate的问题的更多相关文章
- python3 写CSV文件多一个空行的解决办法
Python文档中有提到: open('eggs.csv', newline='') 也就是说,打开文件的时候多指定一个参数.Python文档中也有这样的示例: import csvwith open ...
- 【转载】 迁移学习(Transfer learning),多任务学习(Multitask learning)和端到端学习(End-to-end deep learning)
--------------------- 作者:bestrivern 来源:CSDN 原文:https://blog.csdn.net/bestrivern/article/details/8700 ...
- 我的Keras使用总结(4)——Application中五款预训练模型学习及其应用
本节主要学习Keras的应用模块 Application提供的带有预训练权重的模型,这些模型可以用来进行预测,特征提取和 finetune,上一篇文章我们使用了VGG16进行特征提取和微调,下面尝试一 ...
- 我的Keras使用总结(1)——Keras概述与常见问题整理
今天整理了自己所写的关于Keras的博客,有没发布的,有发布的,但是整体来说是有点乱的.上周有空,认真看了一周Keras的中文文档,稍有心得,整理于此.这里附上Keras官网地址: Keras英文文档 ...
- 我的Keras使用总结(5)——Keras指定显卡且限制显存用量,常见函数的用法及其习题练习
Keras 是一个高层神经网络API,Keras是由纯Python编写而成并基于TensorFlow,Theano以及CNTK后端.Keras为支持快速实验而生,能够将我们的idea迅速转换为结果.好 ...
- 强化学习(Reinforcement Learning)中的Q-Learning、DQN,面试看这篇就够了!
1. 什么是强化学习 其他许多机器学习算法中学习器都是学得怎样做,而强化学习(Reinforcement Learning, RL)是在尝试的过程中学习到在特定的情境下选择哪种行动可以得到最大的回报. ...
- 变量分割技术、判别学习(discriminative learning method)
基于模型的优化方法(model-based optimization method): 小波变换.卡尔曼滤波.中值滤波.均值滤波: 优点:对于处理不同的逆问题都非常灵活:缺点:为了更好的效果而采用各种 ...
- 写一个PHP函数,实现扫描并打印出指定目录下(含子目录)的所有jpg文件名
写一个PHP函数,实现扫描并打印出指定目录下(含子目录)的所有jpg文件名 <?php $dir = "E:\照片\\";//打印文件夹中所有jpg文件 function p ...
- 【深度学习系列】迁移学习Transfer Learning
在前面的文章中,我们通常是拿到一个任务,譬如图像分类.识别等,搜集好数据后就开始直接用模型进行训练,但是现实情况中,由于设备的局限性.时间的紧迫性等导致我们无法从头开始训练,迭代一两百万次来收敛模型, ...
随机推荐
- 使用Unity做2.5D游戏教程(二)
最近在研究Unity 3D,看了老外Marin Todorov写的教程很详细,就翻译过来以便自己参考,翻译不好的地方请多包涵. 这是使用Unity 游戏开发工具制作一个简单的2.5D 游戏系列教程的第 ...
- spring 找不到applicationContext.xml解决方法
初学Spring在用Resource rs=new ClassPathResource("applicationContext.xml");时老是遇到这个错误.后来发现用Appli ...
- CodeForces 438D 线段树 剪枝
D. The Child and Sequence time limit per test 4 seconds memory limit per test 256 megabytes input st ...
- 计算机windows7连接打印机
计算机连接打印机 (1)查看打印机的名字. 示例,打印机名为 HP LaserJet M1522 .... (2)打开windows开始菜单,点击[设备和打印机],然后查看打印机和传真那个区域是否有打 ...
- windows 下QT5.5+vs2013开发环境搭建
开发环境搭建: 1.下载QT,下载的网址如下: http://download.qt.io/official_releases/vsaddin/ http://download.qt.io/offic ...
- MD5进行文件完整性校验的操作方法
我组产品包含大量音频和图片资源,MD5主要就用来检测这些资源文件的完整性.主要思路是:先计算出所有资源文件的MD5值,存到一个xml文件中,作为标准的MD5值.然后把这个xml文件放到我们的产品中,每 ...
- nuxt.js 加百度统计
Mark一下: 在 Nuxt.js应用中使用Google统计分析服务,或者百度统计分析服务,推荐在 plugins 目录下创建 plugins/ga.js 文件.统计统计分析我们可以获取网站pv,uv ...
- sshd登录攻击
先说简单的防范措施: 1.密码足够复杂 密码的长度大于8位.有数字.大小写字母.特殊字符组合. 2.nmap 扫描 为了避免被扫描到, #看到端口是81 ssh root@192.168.1.63 玩 ...
- 一次mysql优化经历
某日运维突然说无线终端的频道页接口訪问量非常大,memcache缓存扛只是来.导致mysql并发查询量太大,导致server不停地宕机,仅仅能不停地重新启动机器.遗憾的是运维并没有告诉mysql查询量 ...
- fetch 函数分装
1.fetch /** * 封装 fetch */ import { hashHistory } from 'react-router'; export default function reques ...