optim.py cs231n

n如果有错误，欢迎指出，不胜感激

import numpy as np

"""

This file implements various first-order update rules that are commonly used for

training neural networks. Each update rule accepts current weights and the

gradient of the loss with respect to those weights and produces the next set of

weights. Each update rule has the same interface:

def update(w, dw, config=None):

Inputs:

  - w: A numpy array giving the current weights.

  - dw: A numpy array of the same shape as w giving the gradient of the

    loss with respect to w.

  - config: A dictionary containing hyperparameter values such as learning rate,

    momentum, etc. If the update rule requires caching values over many

    iterations, then config will also hold these cached values.

Returns:

  - next_w: The next point after the update.

  - config: The config dictionary to be passed to the next iteration of the

    update rule.

NOTE: For most update rules, the default learning rate will probably not perform

well; however the default values of the other hyperparameters should work well

for a variety of different problems.

For efficiency, update rules may perform in-place updates, mutating w and

setting next_w equal to w.

"""

def sgd(w, dw, config=None):

  """

  Performs vanilla stochastic gradient descent.

  config format:

  - learning_rate: Scalar learning rate.

  """

  if config is None: config = {}

  config.setdefault('learning_rate', 1e-2)

  w -= config['learning_rate'] * dw

  return w, config

def sgd_momentum(w, dw, config=None):

  """

  Performs stochastic gradient descent with momentum.

  config format:

  - learning_rate: Scalar learning rate.

  - momentum: Scalar between 0 and 1 giving the momentum value.

    Setting momentum = 0 reduces to sgd.

  - velocity: A numpy array of the same shape as w and dw used to store a moving

    average of the gradients.

  """

  if config is None: config = {}

  config.setdefault('learning_rate', 1e-2)

  config.setdefault('momentum', 0.9)

  v = config.get('velocity', np.zeros_like(w))

  next_w = None

  v=v*config['momentum']-config['learning_rate']*dw

  next_w=w+v

  config['velocity'] = v

  return next_w, config

def rmsprop(x, dx, config=None):

  """

  Uses the RMSProp update rule, which uses a moving average of squared gradient

  values to set adaptive per-parameter learning rates.

  config format:

  - learning_rate: Scalar learning rate.

  - decay_rate: Scalar between 0 and 1 giving the decay rate for the squared

    gradient cache.

  - epsilon: Small scalar used for smoothing to avoid dividing by zero.

  - cache: Moving average of second moments of gradients.

  """

  if config is None: config = {}

  config.setdefault('learning_rate', 1e-2)

  config.setdefault('decay_rate', 0.99)

  config.setdefault('epsilon', 1e-8)

  config.setdefault('cache', np.zeros_like(x))

  next_x = None

  cache=config['cache']*config['decay_rate']+(1-config['decay_rate'])*dx**2

  next_x=x-config['learning_rate']*dx/np.sqrt(cache+config['epsilon'])

  config['cache']=cache

  return next_x, config

def adam(x, dx, config=None):

  """

  Uses the Adam update rule, which incorporates moving averages of both the

  gradient and its square and a bias correction term.

  config format:

  - learning_rate: Scalar learning rate.

  - beta1: Decay rate for moving average of first moment of gradient.

  - beta2: Decay rate for moving average of second moment of gradient.

  - epsilon: Small scalar used for smoothing to avoid dividing by zero.

  - m: Moving average of gradient.

  - v: Moving average of squared gradient.

  - t: Iteration number.

  """

  if config is None: config = {}

  config.setdefault('learning_rate', 1e-3)

  config.setdefault('beta1', 0.9)

  config.setdefault('beta2', 0.999)

  config.setdefault('epsilon', 1e-8)

  config.setdefault('m', np.zeros_like(x))

  config.setdefault('v', np.zeros_like(x))

  config.setdefault('t', 0)

  config['t']+=1

  这个方法比较综合，各种方法的好处吧

  m=config['beta1']*config['m']+(1-config['beta1'])*dx  # now to change by  acc

  v=config['beta2']*config['v']+(1-config['beta2'])*dx**2

  config['m']=m

  config['v']=v

  m=m/(1-config['beta1']**config['t'])

  v=v/(1-config['beta2']**config['t'])

  next_x=x-config['learning_rate']*m/np.sqrt(v+config['epsilon'])

  return next_x, config

optim.py cs231n的更多相关文章

cnn.py cs231n
n import numpy as np from cs231n.layers import * from cs231n.fast_layers import * from cs231n.layer_ ...
fc_net.py cs231n
n如果有错误,欢迎指出,不胜感激 import numpy as np from cs231n.layers import * from cs231n.layer_utils import * cla ...
layers.py cs231n
如果有错误,欢迎指出,不胜感激. import numpy as np def affine_forward(x, w, b): 第一个最简单的 affine_forward简单的前向传递,返回 ou ...
笔记：CS231n+assignment2（作业二）（一）
第二个作业难度很高,但做(抄)完之后收获还是很大的.... 一.Fully-Connected Neural Nets 首先是对之前的神经网络的程序进行重构,目的是可以构建任意大小的全连接的neura ...
深度学习原理与框架-神经网络-cifar10分类(代码) 1.np.concatenate(进行数据串接) 2.np.hstack(将数据横着排列) 3.hasattr(判断.py文件的函数是否存在) 4.reshape(维度重构) 5.tanspose(维度位置变化) 6.pickle.load(f文件读入) 7.np.argmax(获得最大值索引) 8.np.maximum(阈值比较)
横1. np.concatenate(list, axis=0) 将数据进行串接,这里主要是可以将列表进行x轴获得y轴的串接参数说明:list表示需要串接的列表,axis=0,表示从上到下进行串接 ...
optim.py-使用tensorflow实现一般优化算法
optim.py Project URL:https://github.com/Codsir/optim.git Based on: tensorflow, numpy, copy, inspect ...
深度学习之卷积神经网络(CNN)详解与代码实现（一）
卷积神经网络(CNN)详解与代码实现本文系作者原创,转载请注明出处:https://www.cnblogs.com/further-further-further/p/10430073.html 目 ...
深度学习原理与框架-卷积神经网络-cifar10分类(图片分类代码) 1.数据读入 2.模型构建 3.模型参数训练
卷积神经网络:下面要说的这个网络,由下面三层所组成卷积网络:卷积层 + 激活层relu+ 池化层max_pool组成神经网络:线性变化 + 激活层relu 神经网络: 线性变化(获得得分值) 代码 ...
Pytorch1.3源码解析-第一篇
pytorch$ tree -L 1 . ├── android ├── aten ├── benchmarks ├── binaries ├── c10 ├── caffe2 ├── CITATIO ...

随机推荐

apache+flask部署
wsgi方式 1.安装apache 1.解压httpd并进行安装 # tar zxvf httpd-2.2.15.tar.gz # cd httpd-2.2.15 # ./configure - ...
2018-10-31-C#-7.0-使用下划线忽略使用的变量
title author date CreateTime categories C# 7.0 使用下划线忽略使用的变量 lindexi 2018-10-31 14:4:9 +0800 2018-10- ...
Java8的HashMap笔记摘要
问题例子: HashMap 是不是有序的? 不是有序的. 有没有有序的Map实现类呢? 有 TreeMap 和 LinkedHashMap. TreeMap 和 LinkedHashMap 是如何保 ...
centos7 盘符变动绑定槽位
服务器下的硬盘主有机械硬盘.固态硬盘以及raid阵列,通常内核分配盘符的顺序是/dev/sda./dev/sdb… ….在系统启动过程中,内核会按照扫描到硬盘的顺序分配盘符(先分配直通的,再分配阵列) ...
2018-12-15-VisualStudio-通过-EditorBrowsable-隐藏不开放的属性或方法
title author date CreateTime categories VisualStudio 通过 EditorBrowsable 隐藏不开放的属性或方法 lindexi 2018-12- ...
入门servlet：request获取请求体数据
@WebServlet("/RequestDemo5") public class RequestDemo5 extends HttpServlet { protected voi ...
Luogu P1462 通往奥格瑞玛的道路(最短路+二分)
P1462 通往奥格瑞玛的道路题面题目背景在艾泽拉斯大陆上有一位名叫歪嘴哦的神奇术士,他是部落的中坚力量有一天他醒来后发现自己居然到了联盟的主城暴风城在被众多联盟的士兵攻击后,他决定逃回自己 ...
Visual Studio 2019 正式发布
原文链接: https://www.oschina.net/news/105629/vs2019-general-availability 如约而至,微软已于今天推出 Visual Studio 20 ...
[转]json对象详解
json(javascript object notation)全称是javascript对象表示法,它是一种数据交换的文本格式,而不是一种编程语言,用于读取结构化数据.2001年由Douglas C ...
node学习记录——搭建web服务器
web服务器的基本知识功能:1.接收HTTP请求(get,post,delete,put)2.处理HTTP请求常见的web服务器架构: 1. Nginx/Apache:负责接收http请求,确定谁 ...

optim.py cs231n

optim.py cs231n的更多相关文章

随机推荐

热门专题