在完成SVM作业的基础上,Softmax的作业相对比较轻松。

完成本作业需要熟悉与掌握的知识:

cell 1 设置绘图默认参数

 mport random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray' # for auto-reloading extenrnal modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

cell 2 读取数据,并显示各个数据的尺寸:

 def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500):
"""
Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
it for the linear classifier. These are the same steps as we used for the
SVM, but condensed to a single function.
"""
# Load the raw CIFAR-10 data
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir) # subsample the data
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask] # Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1)) # Normalize the data: subtract the mean image
mean_image = np.mean(X_train, axis = 0)
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image # add bias dimension and transform into columns
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))]) return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev # Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data()
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape
print 'dev data shape: ', X_dev.shape
print 'dev labels shape: ', y_dev.shape

数据维度结果:

cell 3 用for循环实现Softmax的loss function 与grad:

 # First implement the naive softmax loss function with nested loops.
# Open the file cs231n/classifiers/softmax.py and implement the
# softmax_loss_naive function. from cs231n.classifiers.softmax import softmax_loss_naive
import time # Generate a random softmax weight matrix and use it to compute the loss.
W = np.random.randn(3073, 10) * 0.0001
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0) # As a rough sanity check, our loss should be something close to -log(0.1).
print 'loss: %f' % loss
print 'sanity check: %f' % (-np.log(0.1))

对应的py文件的代码:

 def softmax_loss_naive(W, X, y, reg):
"""
Softmax loss function, naive implementation (with loops) Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples. Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W) #############################################################################
# TODO: Compute the softmax loss and its gradient using explicit loops. #
# Store the loss in loss and the gradient in dW. If you are not careful #
# here, it is easy to run into numeric instability. Don't forget the #
# regularization! #
#############################################################################
num_calss = W.shape[1]
num_train = X.shape[0]
buf_e = np.zeros(num_calss)
#print buf_e.shape for i in xrange(num_train) :
for j in xrange(num_calss) :
#1*3073 * 3073*1 = 1 >>>10
buf_e[j] = np.dot(X[i,:],W[:,j])
buf_e -= np.max(buf_e)
buf_e = np.exp(buf_e)
buf_sum = np.sum(buf_e)
buf = buf_e/ buf_sum
loss -= np.log(buf[y[i]] )
for j in xrange(num_calss):
dW[:,j] +=( buf[j] - (j ==y[i]) )*X[i,:].T
#regularization with elementwise production
loss /= num_train
dW /= num_train loss += 0.5 * reg * np.sum(W * W)
dW +=reg*W
#gradient #############################################################################
# END OF YOUR CODE #
############################################################################# return loss, dW

计算得到的结果:

使用了课程上所讲的验证方式。

问题:

cell 4 使用数值计算法对解析法得到的grad进行检验:

 # Complete the implementation of softmax_loss_naive and implement a (naive)
# version of the gradient that uses nested loops.
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0) # As we did for the SVM, use numeric gradient checking as a debugging tool.
# The numeric gradient should be close to the analytic gradient.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10) # similar to SVM case, do another gradient check with regularization
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 1e2)
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 1e2)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)

计算结果:

cell 5 使用向量法来实现loss funvtion与grad,并与使用for循环法比较:

 # Now that we have a naive implementation of the softmax loss function and its gradient,
# implement a vectorized version in softmax_loss_vectorized.
# The two versions should compute the same results, but the vectorized version should be
# much faster.
tic = time.time()
loss_naive, grad_naive = softmax_loss_naive(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'naive loss: %e computed in %fs' % (loss_naive, toc - tic) from cs231n.classifiers.softmax import softmax_loss_vectorized
tic = time.time()
loss_vectorized, grad_vectorized = softmax_loss_vectorized(W, X_dev, y_dev, 0.00001)
toc = time.time()
print 'vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic) # As we did for the SVM, we use the Frobenius norm to compare the two versions
# of the gradient.
grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print 'Loss difference: %f' % np.abs(loss_naive - loss_vectorized)
print 'Gradient difference: %f' % grad_difference

比较的结果:

向量法的具体代码实现:

 def softmax_loss_vectorized(W, X, y, reg):
"""
Softmax loss function, vectorized version. Inputs and outputs are the same as softmax_loss_naive.
"""
# Initialize the loss and gradient to zero.
loss = 0.0
dW = np.zeros_like(W)
num_calss = W.shape[1]
num_train = X.shape[0]
#############################################################################
# TODO: Compute the softmax loss and its gradient using no explicit loops. #
# Store the loss in loss and the gradient in dW. If you are not careful #
# here, it is easy to run into numeric instability. Don't forget the #
# regularization! #
#############################################################################
#500*3073 3073*10 >>>500*10
buf_e = np.dot(X,W)
# 10 * 500 - 1*500 T
buf_e = np.subtract( buf_e.T , np.max(buf_e , axis = 1) ).T
buf_e = np.exp(buf_e)
#10*500 - 1*500 T
buf_e = np.divide( buf_e.T , np.sum(buf_e , axis = 1) ).T
#get loss
#print buf.shape
loss = - np.sum(np.log ( buf_e[np.arange(num_train),y] ) )
#get grad
buf_e[np.arange(num_train),y] -= 1
# 3073 * 500 * 500*10
loss /=num_train + 0.5 * reg * np.sum(W * W)
dW = np.dot(X.T,buf_e)/num_train + reg*W #############################################################################
# END OF YOUR CODE #
############################################################################# return loss, dW

cell 6 使用验证集与训练集做超参数选取:

 # Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = np.logspace(-10, 10, 10)# [1e-7, 2e-7,3e-7,4e-7,5e-7]
regularization_strengths = np.logspace(-3, 6, 10) #[1e4,5e4,1e5,5e5,1e6,5e6,1e7,5e7,1e8] ################################################################################
# TODO: #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save #
# the best trained softmax classifer in best_softmax. #
################################################################################
iters = 1500
for lr in learning_rates:
for rs in regularization_strengths:
softmax = Softmax()
softmax.train(X_train, y_train, learning_rate=lr, reg=rs, num_iters=iters)
y_train_pred = softmax.predict(X_train)
accu_train = np.mean(y_train == y_train_pred)
y_val_pred = softmax.predict(X_val)
accu_val = np.mean(y_val == y_val_pred) results[(lr, rs)] = (accu_train, accu_val) if best_val < accu_val:
best_val = accu_val
best_softmax = softmax
################################################################################
# END OF YOUR CODE #
################################################################################ # Print out results.
for lr, reg in sorted(results):
train_accuracy, val_accuracy = results[(lr, reg)]
print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
lr, reg, train_accuracy, val_accuracy) print 'best validation accuracy achieved during cross-validation: %f' % best_val

得到较好的结果:

cell 7 选取较好的超参数的模型,对测试集进行测试,计算准确率:

 # evaluate on test set
# Evaluate the best softmax on test set
y_test_pred = best_softmax.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print 'softmax on raw pixels final test set accuracy: %f' % (test_accuracy, )

结果:0.378

cell 8 可视化w值:

 # Visualize the learned weights for each class
w = best_softmax.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10) w_min, w_max = np.min(w), np.max(w) classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
for i in xrange(10):
plt.subplot(2, 5, i + 1) # Rescale the weights to be between 0 and 255
wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
plt.imshow(wimg.astype('uint8'))
plt.axis('off')
plt.title(classes[i])

结果:

注:在softmax 与svm的超参数选择时,使用了共同的类,以及类中不同的相应的方法。具体的文件内容与注释如下

附:通关CS231n企鹅群:578975100 validation:DL-CS231n

CS231n 2016 通关 第三章-Softmax 作业的更多相关文章

  1. CS231n 2016 通关 第三章-SVM 作业分析

    作业内容,完成作业便可熟悉如下内容: cell 1  设置绘图默认参数 # Run some setup code for this notebook. import random import nu ...

  2. CS231n 2016 通关 第三章-SVM与Softmax

    1===本节课对应视频内容的第三讲,对应PPT是Lecture3 2===本节课的收获 ===熟悉SVM及其多分类问题 ===熟悉softmax分类问题 ===了解优化思想 由上节课即KNN的分析步骤 ...

  3. CS231n 2016 通关 第四章-NN 作业

    cell 1 显示设置初始化 # A bit of setup import numpy as np import matplotlib.pyplot as plt from cs231n.class ...

  4. CS231n 2016 通关 第五章 Training NN Part1

    在上一次总结中,总结了NN的基本结构. 接下来的几次课,对一些具体细节进行讲解. 比如激活函数.参数初始化.参数更新等等. ====================================== ...

  5. CS231n 2016 通关 第六章 Training NN Part2

    本章节讲解 参数更新 dropout ================================================================================= ...

  6. CS231n 2016 通关 第四章-反向传播与神经网络(第一部分)

    在上次的分享中,介绍了模型建立与使用梯度下降法优化参数.梯度校验,以及一些超参数的经验. 本节课的主要内容: 1==链式法则 2==深度学习框架中链式法则 3==全连接神经网络 =========== ...

  7. CSAPP深入理解计算机系统(第二版)第三章家庭作业答案

    <深入理解计算机系统(第二版)>CSAPP 第三章 家庭作业 这一章介绍了AT&T的汇编指令 比较重要 本人完成了<深入理解计算机系统(第二版)>(以下简称CSAPP) ...

  8. C++第三章课后作业答案及解析---指针的使用

    今天继续完成上周没有完成的习题---C++第三章课后作业,本章题涉及指针的使用,有指向对象的指针做函数参数,对象的引用以及友元类的使用方法等 它们具体的使用方法在下面的题目中会有具体的解析(解析标注在 ...

  9. Hand on Machine Learning第三章课后作业(1):垃圾邮件分类

    import os import email import email.policy 1. 读取邮件数据 SPAM_PATH = os.path.join( "E:\\3.Study\\机器 ...

随机推荐

  1. meta标签多种用法

    <meta name=”google” content=”notranslate” /> <!-- 有时,Google在结果页面会提供一个翻译链接,但有时候你不希望出现这个链接,你可 ...

  2. Android 学习之逐帧动画(Frame)

    帧动画就是将一些列图片.依次播放. 利用肉眼的"视觉暂留"的原理,给用户的感觉是动画的错觉,逐帧动画的原理和早期的电影原理是一样的. a:须要定义逐帧动画,能够通过代码定义.也能够 ...

  3. SQL获取年月日方法

    方法一:利用DATENAME 在SQL数据库中,DATENAME(datetype,date)函数的作用是从日期中提取指定部分数据,其返回类型是nvarchar.datetype类型见附表1. SEL ...

  4. 利用反射和泛型把Model对象按行储存进数据库以及按行取出然后转换成Model 类实例 MVC网站通用配置项管理

    利用反射和泛型把Model对象按行储存进数据库以及按行取出然后转换成Model 类实例 MVC网站通用配置项管理   2018-3-10 15:18 | 发布:Admin | 分类:代码库 | 评论: ...

  5. JavaScript 日期格式化 简单有用

    JavaScript 日期格式化 简单有用 代码例如以下,引入jquery后直接后增加下面代码刷新可測试 Date.prototype.Format = function (fmt) { //auth ...

  6. pyqt5 学习总结

    关于基类 一般的文件都会基于QWidget,QtWidgets.QMainWindow 或QDialog,like this class Example(QWidget): QWidget类是所有用户 ...

  7. kubernetes故障现场一之Orphaned pod

    系列目录 问题描述:周五写字楼整体停电,周一再来的时候发现很多pod的状态都是Terminating,经排查是因为测试环境kubernetes集群中的有些节点是PC机,停电后需要手动开机才能起来.起来 ...

  8. Java爬虫快速开发工具uncs的部署攻略

    写在前面 uncs是java快速开发爬虫的工具,简单便捷,经过大量版本迭代和生产验证,可以适用大多数网站,推荐使用. 一.基本用法 1.1 开发包获取 目前只能在公司内网maven服务器获取到 < ...

  9. 话说Session

    Session这个概念,对于搞软件的来说,再熟悉不过了.就拿我来说,Hibernate, Shiro, Spring, JSP, Web Server等等,全都涉及到Session. 不怕笑话,一直都 ...

  10. 互联网金融MySQL优化参数标准

    InnoDB配置 从MySQL 5.5版本开始,InnoDB就是默认的存储引擎并且它比任何其它存储引擎的使用要多得多.那也是为什么它需要小心配置的原因. innodb_file_per_table 表 ...