two_layer_net.ipynb

之前对 x.reshape(x.shape[0], -1)语句的输出结果理解一直有误:

  1. 1 x = [[1,4,7,2],[2,5,7,4]]
  2. 2 x = np.array(x)
  3. 3 x0 = x.reshape(x.shape[0], -1)
  4. 4 x1 = x.reshape(x.shape[1], -1)
  5. 5 print(x0)
  6. 6 print(x1)

的输出实际为

  1. [[1 4 7 2]
  2. [2 5 7 4]]
  3. [[1 4]
  4. [7 2]
  5. [2 5]
  6. [7 4]]

Affine layer: forward

  1. # Test the affine_forward function
  2.  
  3. num_inputs = 2
  4. input_shape = (4, 5, 6)
  5. output_dim = 3
  6.  
  7. input_size = num_inputs * np.prod(input_shape)
  8. #print(np.prod(input_shape)) #120
  9. weight_size = output_dim * np.prod(input_shape)
  10.  
  11. x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
  12. #prin(t(*input_shape)# 4 5 6
  13. print(np.shape(x))# (2, 4, 5, 6)
  14. w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
  15. b = np.linspace(-0.3, 0.1, num=output_dim)
  16.  
  17. out, _ = affine_forward(x, w, b)
  18. correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],
  19. [ 3.25553199, 3.5141327, 3.77273342]])
  20.  
  21. # Compare your output with ours. The error should be around e-9 or less.
  22. print('Testing affine_forward function:')
  23. print('difference: ', rel_error(out, correct_out))

要补充的函数为

  1. def affine_forward(x, w, b):
  2.  
  3. out = None
  4.  
  5. x_vector = x.reshape(x.shape[0], -1)
  6. out = x_vector.dot(w)
  7. out += b
  8.  
  9. return out, cache

Affine layer: backward

  1. # Test the affine_forward function
  2.  
  3. num_inputs = 2
  4. input_shape = (4, 5, 6)
  5. output_dim = 3
  6.  
  7. input_size = num_inputs * np.prod(input_shape)
  8. #print(np.prod(input_shape)) #120
  9. weight_size = output_dim * np.prod(input_shape)
  10.  
  11. x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
  12. #print(*input_shape)# 4 5 6
  13. print(np.shape(x))# (2, 4, 5, 6)
  14. w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
  15. b = np.linspace(-0.3, 0.1, num=output_dim)
  16.  
  17. out, _ = affine_forward(x, w, b)
  18. correct_out = np.array([[ 1.49834967, 1.70660132, 1.91485297],
  19. [ 3.25553199, 3.5141327, 3.77273342]])
  20.  
  21. # Compare your output with ours. The error should be around e-9 or less.
  22. print('Testing affine_forward function:')
  23. print('difference: ', rel_error(out, correct_out))

其中函数为

  1. def affine_backward(dout, cache):
  2.  
  3. x, w, b = cache
  4. dx, dw, db = None, None, None
  5.  
  6. # shape :x(10*2*3) w(6*5) b(5) out(10*5)
  7. dx = np.dot(dout, w.T) # (N, M) dot (D, M).T -> (N, D): (10, 6)
  8. dx = dx.reshape(x.shape) # 将 dx 调整为与 x 相同的形状: (10*2*3)
  9.  
  10. dw = np.dot(x.reshape(x.shape[0], -1).T, dout) # (D, N) dot (N, M) -> (D, M) 6*10 dot 10*5 = 6*5
  11.  
  12. db = np.sum(dout, axis=0) # 沿着样本维度求和,得到 (M,) 形状的梯度
  13.  
  14. return dx, dw, db

ReLU activation

正向

  1. out = np.maximum(x,0)

反向的话刚开始理解错了,写成了

  1. dx=np.maximum(dx,0)

显然是错误的,应当判断x是否小于0而不是dx

  1. dx = np.copy(dout)
  2. dx[x <= 0] = 0

即可。

Sandwich layers

看一下layer_utis.py,看起来就是affine和relu封装在了一起,误差也在e-11 e-12这样

Loss layers: SVM & Softmax

svm

  1. def svm_loss(x, y):
  2. loss, dx = None, None
  3.  
  4. num_train = x.shape[0]
  5. scores = x - np.max(x, axis=1, keepdims=True)
  6. correct_class_scores = scores[np.arange(num_train), y]
  7. margins = np.maximum(0, scores - correct_class_scores[:, np.newaxis] + 1)
  8. margins[np.arange(num_train), y] = 0
  9. loss = np.sum(margins) / num_train
  10.  
  11. num_pos = np.sum(margins > 0, axis=1)
  12. dx = np.zeros_like(x)
  13. dx[margins > 0] = 1
  14. dx[np.arange(num_train), y] -= num_pos
  15. dx /= num_train
  16.  
  17. return loss, dx

造了个3,4的看一下过程

  1. x:
  2. [[ 4.17943411e-04 1.39710028e-03 -1.78590431e-03]
  3. [-7.08827734e-04 -7.47253161e-05 -7.75016769e-04]
  4. [-1.49797903e-04 1.86172902e-03 -1.42552930e-03]
  5. [-3.76356699e-04 -3.42275390e-04 2.94907637e-04]]
  6.  
  7. y:
  8. [2 1 1 0]
  9.  
  10. scores:
  11. [[-0.00097916 0. -0.003183 ]
  12. [-0.0006341 0. -0.00070029]
  13. [-0.00201153 0. -0.00328726]
  14. [-0.00067126 -0.00063718 0. ]]
  15.  
  16. correct class scores:
  17. [-0.003183 0. 0. -0.00067126]
  1. margins:
  2. [[1.00220385 1.003183 1. ]
  3. [0.9993659 1. 0.99929971]
  4. [0.99798847 1. 0.99671274]
  5. [1. 1.00003408 1.00067126]]
  6.  
  7. margins:
  8. [[1.00220385 1.003183 0. ]
  9. [0.9993659 0. 0.99929971]
  10. [0.99798847 0. 0.99671274]
  11. [0. 1.00003408 1.00067126]]
  12.  
  13. num_pos:
  14. [2 2 2 2]
  15. dx
  16.  
  17. [[1. 1. 0.]
  18. [1. 0. 1.]
  19. [1. 0. 1.]
  20. [0. 1. 1.]]
  21.  
  22. dx:
  23. [[ 1. 1. -2.]
  24. [ 1. -2. 1.]
  25. [ 1. -2. 1.]
  26. [-2. 1. 1.]]

softmax

dx去掉了x转置dot(dscore)

  1. def softmax_loss(x, y):
  2.  
  3. loss, dx = None, None
  4. *START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
  5.  
  6. num_train = x.shape[0]
  7. scores = x - np.max(x, axis=1, keepdims=True)
  8. exp_scores = np.exp(scores)
  9. #correct_class_scores = scores[np.arange(num_train), y]
  10.  
  11. probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
  12.  
  13. loss = np.sum(-np.log(probs[np.arange(num_train ), y])) / num_train
  14.  
  15. # Compute the gradient
  16. dscores = probs
  17. dscores[np.arange(num_train ), y] -= 1
  18. dscores /= num_train
  19.  
  20. dx = dscores
  21.  
  22. return loss, dx

Two-layer network

这一部分就看一下fc_net.py里面TwoLayerNet的类

假设输入维度是D,隐藏(层)维度H,分类数为C

结构是affine - relu - affine - softmax,不包含梯度下降。
模型参数存在字典self.params中
  1. from builtins import range
  2. from builtins import object
  3. import numpy as np
  4.  
  5. from ..layers import *
  6. from ..layer_utils import *
  7.  
  8. class TwoLayerNet(object):
  9.  
  10. def __init__(
  11. self,
  12. input_dim=3 * 32 * 32,
  13. hidden_dim=100,
  14. num_classes=10,
  15. weight_scale=1e-3,
  16. reg=0.0,
  17. ):
  18.  
  19. self.params = {}
  20. self.reg = reg
  21.  
  22. self.params['W1'] = weight_scale * np.random.randn(input_dim,hidden_dim)
  23. self.params['b1'] = np.zeros(hidden_dim)
  24.  
  25. self.params['W2'] = weight_scale * np.random.randn(hidden_dim, num_classes)
  26. self.params['b2'] = np.zeros(num_classes)
  27.  
  28. def loss(self, X, y=None):
  29.  
  30. scores = None
  31.  
  32. hidden_layer = np.maximum(0, np.dot(X, self.params['W1']) + self.params['b1']) # ReLU activation
  33. scores = np.dot(hidden_layer, self.params['W2']) + self.params['b2']
  34.  
  35. # If y is None then we are in test mode so just return scores
  36. if y is None:
  37. return scores
  38.  
  39. loss, grads = 0, {}
  40.  
  41. num_train = X.shape[0]
  42. scores -= np.max(scores, axis=1, keepdims=True) # for numerical stability
  43. softmax_scores = np.exp(scores) / np.sum(np.exp(scores), axis=1, keepdims=True)
  44. correct_class_scores = softmax_scores[range(num_train), y]
  45. data_loss = -np.log(correct_class_scores).mean()
  46. reg_loss = 0.5 * self.reg * (np.sum(self.params['W1'] ** 2) + np.sum(self.params['W2'] ** 2))
  47. loss = data_loss + reg_loss
  48.  
  49. # Backward pass
  50. dscores = softmax_scores.copy()
  51. dscores[range(num_train), y] -= 1
  52. dscores /= num_train
  53.  
  54. grads['W2'] = np.dot(hidden_layer.T, dscores) + self.reg * self.params['W2']
  55. grads['b2'] = np.sum(dscores, axis=0)
  56.  
  57. dhidden = np.dot(dscores, self.params['W2'].T)
  58. dhidden[hidden_layer <= 0] = 0 # backpropagate through ReLU
  59.  
  60. grads['W1'] = np.dot(X.T, dhidden) + self.reg * self.params['W1']
  61. grads['b1'] = np.sum(dhidden, axis=0)
  62.  
  63. return loss, grads
 

CS231N Assigenment1 two_layer_net笔记的更多相关文章

  1. 【cs231n】神经网络笔记笔记2

    ) # 对数据进行零中心化(重要) cov = np.dot(X.T, X) / X.shape[0] # 得到数据的协方差矩阵 数据协方差矩阵的第(i, j)个元素是数据第i个和第j个维度的协方差. ...

  2. 【cs231n】最优化笔记

    ): W = np.random.randn(10, 3073) * 0.0001 # generate random parameters loss = L(X_train, Y_train, W) ...

  3. cs231n官方note笔记

    本文记录官方note中比较新颖和有价值的观点(从反向传播开始) 一 反向传播 1 “反向传播是一个优美的局部过程.在整个计算线路图中,每个门单元都会得到一些输入并立即计算两个东西:1. 这个门的输出值 ...

  4. [基础]斯坦福cs231n课程视频笔记(三) 训练神经网络

    目录 training Neural Network Activation function sigmoid ReLU Preprocessing Batch Normalization 权重初始化 ...

  5. CS231n 2017 学习笔记01——KNN(K-Nearest Neighbors)

    本博客内容来自 Stanford University CS231N 2017 Lecture 2 - Image Classification 课程官网:http://cs231n.stanford ...

  6. 【cs231n】图像分类笔记

    前言 首先声明,以下内容绝大部分转自知乎智能单元,他们将官方学习笔记进行了很专业的翻译,在此我会直接copy他们翻译的笔记,有些地方会用红字写自己的笔记,本文只是作为自己的学习笔记.本文内容官网链接: ...

  7. [基础]斯坦福cs231n课程视频笔记(二) 神经网络的介绍

    目录 Introduction to Neural Networks BP Nerual Network Convolutional Neural Network Introduction to Ne ...

  8. [基础]斯坦福cs231n课程视频笔记(一) 图片分类之使用线性分类器

    线性分类器的基本模型: f = Wx Loss Function and Optimization 1. LossFunction 衡量在当前的模型(参数矩阵W)的效果好坏 Multiclass SV ...

  9. cs231n学习笔记——lecture6 Training Neural Networks

    该博客主要用于个人学习记录,部分内容参考自:[基础]斯坦福cs231n课程视频笔记(三) 训练神经网络.[cs231n笔记]10.神经网络训练技巧(上).CS231n学习笔记-训练神经网络.整理学习之 ...

  10. 『cs231n』绪论

    笔记链接 cs231n系列所有图片笔记均拷贝自网络,链接如上,特此声明,后篇不再重复. 计算机视觉历史 总结出视觉两个重要结论:1.基础的视觉神经识别的是简单的边缘&轮廓2.视觉是分层的 数据 ...

随机推荐

  1. 低门槛上手快!火山引擎 VeDI 这样满足数据分析新需求

      更多技术交流.求职机会,欢迎关注字节跳动数据平台微信公众号,回复[1]进入官方交流群 近日,市场研究机构 IDC 发布<2022 年 V2 全球大数据支出指南>. 数据显示,2021 ...

  2. 对不起,你做的 A/B 实验都是错的——火山引擎 DataTester 科普

    DataTester 是火山引擎数智平台旗下产品,能基于先进的底层算法,提供科学分流能力和智能的统计引擎,支持多种复杂的 A/B 实验类型.DataTester 深度耦合推荐.广告.搜索.UI.产品功 ...

  3. Axure App 垂直滚动

    拖两个动态面版 最外层[动态面板]用来定义显示区域,高度:692 (根据实际来) 里面的[动态面板],用来放内容,高度根据实际情况来,示例中是:1920 如下图所示 里面的[动态面板]添加垂直滚动 外 ...

  4. 记一次go应用在k8s pod已用内存告警不准确分析

    版权说明: 本文章版权归本人及博客园共同所有,转载请在文章前标明原文出处( https://www.cnblogs.com/mikevictor07/p/17968696.html ),以下内容为个人 ...

  5. 生成学习全景:从基础理论到GANs技术实战

    本文全面探讨了生成学习的理论与实践,包括对生成学习与判别学习的比较.详细解析GANs.VAEs及自回归模型的工作原理与结构,并通过实战案例展示了GAN模型在PyTorch中的实现. 关注TechLea ...

  6. Codeforces Round #645 (Div. 2)

    这一次的Div.2 大多数学思维.. A. Park Lightingtime https://codeforces.com/contest/1358/problem/A 题意:给一个n,m为边的矩形 ...

  7. Optional详细用法

    package com.example.apidemo.jdk8; import com.example.apidemo.vo.UserInfo; import java.math.BigDecima ...

  8. Serverless 应用托管助力企业加速创新

    作者: 熊峰 | 阿里云技术专家 云原生时代的 Serverless 应用托管架构 回顾过去十年,数字化转型将科技创新与商业元素不断融合.重构,重新定义了新业态下的增长极.商业正在从大工业时代的固化范 ...

  9. Serverless 架构下的 AI 应用开发

    Serverless架构与CI/CD工具的结合 CI/CD 是一种通过在应用开发阶段引入自动化流程以频繁向客户交付应用的方法.如图所示,CI/CD 的核心概念是持续集成.持续交付和持续部署. 作为一个 ...

  10. VueTreeselect

    https://www.vue-treeselect.cn/ 官网简介