CS231n 2016 通关 第三章-SVM 作业分析
作业内容,完成作业便可熟悉如下内容:
cell 1 设置绘图默认参数
- # Run some setup code for this notebook.
- import random
- import numpy as np
- from cs231n.data_utils import load_CIFAR10
- import matplotlib.pyplot as plt
- # This is a bit of magic to make matplotlib figures appear inline in the
- # notebook rather than in a new window.
- %matplotlib inline
- plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
- plt.rcParams['image.interpolation'] = 'nearest'
- plt.rcParams['image.cmap'] = 'gray'
- # Some more magic so that the notebook will reload external python modules;
- # see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
- %load_ext autoreload
- %autoreload 2
cell 2 读取数据得到数组(矩阵)
- # Load the raw CIFAR-10 data.
- cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
- X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
- # As a sanity check, we print out the size of the training and test data.
- print 'Training data shape: ', X_train.shape
- print 'Training labels shape: ', y_train.shape
- print 'Test data shape: ', X_test.shape
- print 'Test labels shape: ', y_test.shape
得到结果,之后会用到,方便矩阵运算:
cell 3 对每一类随机选取对应的例子,并进行可视化:
- # Visualize some examples from the dataset.
- # We show a few examples of training images from each class.
- classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
- num_classes = len(classes)
- samples_per_class = 7
- for y, cls in enumerate(classes):
- idxs = np.flatnonzero(y_train == y)
- idxs = np.random.choice(idxs, samples_per_class, replace=False)
- for i, idx in enumerate(idxs):
- plt_idx = i * num_classes + y + 1
- plt.subplot(samples_per_class, num_classes, plt_idx)
- plt.imshow(X_train[idx].astype('uint8'))
- plt.axis('off')
- if i == 0:
- plt.title(cls)
- plt.show()
可视化结果:
cell 4 将数据集划分为训练集,验证集。
- # Split the data into train, val, and test sets. In addition we will
- # create a small development set as a subset of the training data;
- # we can use this for development so our code runs faster.
- num_training = 49000
- num_validation = 1000
- num_test = 1000
- num_dev = 500
- # Our validation set will be num_validation points from the original
- # training set.
- mask = range(num_training, num_training + num_validation)
- X_val = X_train[mask]
- y_val = y_train[mask]
- # Our training set will be the first num_train points from the original
- # training set.
- mask = range(num_training)
- X_train = X_train[mask]
- y_train = y_train[mask]
- # We will also make a development set, which is a small subset of
- # the training set.
- mask = np.random.choice(num_training, num_dev, replace=False)
- X_dev = X_train[mask]
- y_dev = y_train[mask]
- # We use the first num_test points of the original test set as our
- # test set.
- mask = range(num_test)
- X_test = X_test[mask]
- y_test = y_test[mask]
- print 'Train data shape: ', X_train.shape
- print 'Train labels shape: ', y_train.shape
- print 'Validation data shape: ', X_val.shape
- print 'Validation labels shape: ', y_val.shape
- print 'Test data shape: ', X_test.shape
- print 'Test labels shape: ', y_test.shape
得到各个数据集的尺寸:
cell 5 将数据reshape,方便计算
- # Preprocessing: reshape the image data into rows
- X_train = np.reshape(X_train, (X_train.shape[0], -1))
- X_val = np.reshape(X_val, (X_val.shape[0], -1))
- X_test = np.reshape(X_test, (X_test.shape[0], -1))
- X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
- # As a sanity check, print out the shapes of the data
- print 'Training data shape: ', X_train.shape
- print 'Validation data shape: ', X_val.shape
- print 'Test data shape: ', X_test.shape
- print 'dev data shape: ', X_dev.shape
reshape 后的结果
cell 6 计算训练集的均值
- # Preprocessing: subtract the mean image
- # first: compute the image mean based on the training data
- mean_image = np.mean(X_train, axis=0)
- print mean_image[:10] # print a few of the elements
- plt.figure(figsize=(4,4))
- plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
- plt.show()
对均值可视化:
cell 7 训练集和测试集减去均值图像:
- # second: subtract the mean image from train and test data
- X_train -= mean_image
- X_val -= mean_image
- X_test -= mean_image
- X_dev -= mean_image
cell 8 将截距项b添加到数据里,方便计算
- # third: append the bias dimension of ones (i.e. bias trick) so that our SVM
- # only has to worry about optimizing a single weight matrix W.
- X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
- X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
- X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
- X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
- print X_train.shape, X_val.shape, X_test.shape, X_dev.shape
数据尺寸:
此前步骤为初始步骤,基本不需要修改,之后是作业核心。
cell 9 使用for循环计算loss
- # Evaluate the naive implementation of the loss we provided for you:
- from cs231n.classifiers.linear_svm import svm_loss_naive
- import time
- # generate a random SVM weight matrix of small numbers
- W = np.random.randn(3073, 10) * 0.0001
- loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.00001)
- print 'loss: %f' % (loss, )
svm_loss_naive()具体实现:
- def svm_loss_naive(W, X, y, reg):
- """
- Structured SVM loss function, naive implementation (with loops).
- Inputs have dimension D, there are C classes, and we operate on minibatches
- of N examples.
- Inputs:
- - W: A numpy array of shape (D, C) containing weights.
- - X: A numpy array of shape (N, D) containing a minibatch of data.
- - y: A numpy array of shape (N,) containing training labels; y[i] = c means
- that X[i] has label c, where 0 <= c < C.
- - reg: (float) regularization strength
- Returns a tuple of:
- - loss as single float
- - gradient with respect to weights W; an array of same shape as W
- """
- dW = np.zeros(W.shape) # initialize the gradient as zero
- # compute the loss and the gradient
- # 10 classes
- num_classes = W.shape[1]
- #train 49000
- num_train = X.shape[0]
- loss = 0.0
- """
- for i ->49000:
- for j ->10:
- compute every train smaple about each class
- """
- #L = np.zeros(num_train,nun_class)
- for i in xrange(num_train):
- scores = X[i].dot(W)
- #1*3073 * 3073*10 >>1*10
- correct_class_score = scores[y[i]]
- for j in xrange(num_classes):
- if j == y[i]:
- continue
- margin = scores[j] - correct_class_score + 1 # note delta = 1
- if margin > 0:
- loss += margin
- #dW 3073*10
- dW[:,j] += X[i].T
- dW[:,y[i]] -= X[i].T
- # Right now the loss is a sum over all training examples, but we want it
- # to be an average instead so we divide by num_train.
- loss /= num_train
- dW /= num_train
- # Add regularization to the loss.
- # 1/2*w^2 w*w W*W = np.multiply(W,W) >>>>elementwise product
- loss += reg * np.sum(W * W)
- dW +=reg*W
- #############################################################################
- # TODO: #
- # Compute the gradient of the loss function and store it dW. #
- # Rather that first computing the loss and then computing the derivative, #
- # it may be simpler to compute the derivative at the same time that the #
- # loss is being computed. As a result you may need to modify some of the #
- # code above to compute the gradient. #
- #############################################################################
- #computing coreect is differnt with others
- return loss, dW
loss结果为8.97460
cell 10 完善svm_loss_naive(),梯度下降更新变量:
- # Once you've implemented the gradient, recompute it with the code below
- # and gradient check it with the function we provided for you
- # Compute the loss and its gradient at W.
- loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)
- # Numerically compute the gradient along several randomly chosen dimensions, and
- # compare them with your analytically computed gradient. The numbers should match
- # almost exactly along all dimensions.
- from cs231n.gradient_check import grad_check_sparse
- f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
- grad_numerical = grad_check_sparse(f, W, grad)
- # do the gradient check once again with regularization turned on
- # you didn't forget the regularization gradient did you?
- loss, grad = svm_loss_naive(W, X_dev, y_dev, 1e2)
- f = lambda w: svm_loss_naive(w, X_dev, y_dev, 1e2)[0]
- grad_numerical = grad_check_sparse(f, W, grad)
梯度更新代码已在cell9调用时给出。
梯度校验》》》使用解析法得到的dw与数值计算法得到的结果比较:
问题:
cell 11 使用向量(矩阵)的方式计算loss,并与之前的结果比较:
- # Next implement the function svm_loss_vectorized; for now only compute the loss;
- # we will implement the gradient in a moment.
- tic = time.time()
- loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
- toc = time.time()
- print 'Naive loss: %e computed in %fs' % (loss_naive, toc - tic)
- from cs231n.classifiers.linear_svm import svm_loss_vectorized
- tic = time.time()
- loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
- toc = time.time()
- print 'Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)
- # The losses should match but your vectorized implementation should be much faster.
- print 'difference: %f' % (loss_naive - loss_vectorized)
差异结果:
cell 12 使用向量(矩阵)的方式计算grad,并与之前的结果比较:
- # Complete the implementation of svm_loss_vectorized, and compute the gradient
- # of the loss function in a vectorized way.
- # The naive implementation and the vectorized implementation should match, but
- # the vectorized version should still be much faster.
- tic = time.time()
- _, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)
- toc = time.time()
- print 'Naive loss and gradient: computed in %fs' % (toc - tic)
- tic = time.time()
- _, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)
- toc = time.time()
- print 'Vectorized loss and gradient: computed in %fs' % (toc - tic)
- # The loss is a single number, so it is easy to compare the values computed
- # by the two implementations. The gradient on the other hand is a matrix, so
- # we use the Frobenius norm to compare them.
- difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
- print 'difference: %f' % difference
结果比较:
cell 13 随机梯度下降》》SGD
- # In the file linear_classifier.py, implement SGD in the function
- # LinearClassifier.train() and then run it with the code below.
- from cs231n.classifiers import LinearSVM
- svm = LinearSVM()
- tic = time.time()
- loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,
- num_iters=1500, verbose=True)
- toc = time.time()
- print 'That took %fs' % (toc - tic)
loss 结果:
cell 14 画出loss变化曲线:
- # A useful debugging strategy is to plot the loss as a function of
- # iteration number:
- plt.plot(loss_hist)
- plt.xlabel('Iteration number')
- plt.ylabel('Loss value')
- plt.show()
结果:
cell 15 用上述训练的参数,对验证集与训练集进行测试:
- # Write the LinearSVM.predict function and evaluate the performance on both the
- # training and validation set
- y_train_pred = svm.predict(X_train)
- print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )
- y_val_pred = svm.predict(X_val)
- print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )
结果:
cell 16 对超参数》》学习率 正则化强度 使用训练集进行训练,使用验证集验证结果,选取最好的超参数。
- # Use the validation set to tune hyperparameters (regularization strength and
- # learning rate). You should experiment with different ranges for the learning
- # rates and regularization strengths; if you are careful you should be able to
- # get a classification accuracy of about 0.4 on the validation set.
- learning_rates = [1e-7, 2e-7, 3e-7, 5e-5, 8e-7]
- regularization_strengths = [1e4, 2e4, 3e4, 4e4, 5e4, 6e4, 7e4, 8e4, 1e5]
- # results is dictionary mapping tuples of the form
- # (learning_rate, regularization_strength) to tuples of the form
- # (training_accuracy, validation_accuracy). The accuracy is simply the fraction
- # of data points that are correctly classified.
- results = {}
- best_val = -1 # The highest validation accuracy that we have seen so far.
- best_svm = None # The LinearSVM object that achieved the highest validation rate.
- ################################################################################
- # TODO: #
- # Write code that chooses the best hyperparameters by tuning on the validation #
- # set. For each combination of hyperparameters, train a linear SVM on the #
- # training set, compute its accuracy on the training and validation sets, and #
- # store these numbers in the results dictionary. In addition, store the best #
- # validation accuracy in best_val and the LinearSVM object that achieves this #
- # accuracy in best_svm. #
- # #
- # Hint: You should use a small value for num_iters as you develop your #
- # validation code so that the SVMs don't take much time to train; once you are #
- # confident that your validation code works, you should rerun the validation #
- # code with a larger value for num_iters. #
- ################################################################################
- iters = 1500
- for lr in learning_rates:
- for rs in regularization_strengths:
- svm = LinearSVM()
- svm.train(X_train, y_train, learning_rate=lr, reg=rs, num_iters=iters)
- y_train_pred = svm.predict(X_train)
- accu_train = np.mean(y_train == y_train_pred)
- y_val_pred = svm.predict(X_val)
- accu_val = np.mean(y_val == y_val_pred)
- results[(lr, rs)] = (accu_train, accu_val)
- if best_val < accu_val:
- best_val = accu_val
- best_svm = svm
- ################################################################################
- # END OF YOUR CODE #
- ################################################################################
- # Print out results.
- for lr, reg in sorted(results):
- train_accuracy, val_accuracy = results[(lr, reg)]
- print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
- lr, reg, train_accuracy, val_accuracy)
- print 'best validation accuracy achieved during cross-validation: %f' % best_val
最终结果:
cell 17 对各组超参数的结果可视化:
- # Visualize the cross-validation results
- import math
- x_scatter = [math.log10(x[0]) for x in results]
- y_scatter = [math.log10(x[1]) for x in results]
- # plot training accuracy
- marker_size = 100
- colors = [results[x][0] for x in results]
- plt.subplot(2, 1, 1)
- plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
- plt.colorbar()
- plt.xlabel('log learning rate')
- plt.ylabel('log regularization strength')
- plt.title('CIFAR-10 training accuracy')
- # plot validation accuracy
- colors = [results[x][1] for x in results] # default size of markers is 20
- plt.subplot(2, 1, 2)
- plt.scatter(x_scatter, y_scatter, marker_size, c=colors)
- plt.colorbar()
- plt.xlabel('log learning rate')
- plt.ylabel('log regularization strength')
- plt.title('CIFAR-10 validation accuracy')
- plt.show()
显示结果:
cell 18 使用得到最好超参数的模型对test集进行预测,比较预测值与真实值,计算准确率。
- # Evaluate the best svm on test set
- y_test_pred = best_svm.predict(X_test)
- test_accuracy = np.mean(y_test == y_test_pred)
- print 'linear SVM on raw pixels final test set accuracy: %f' % test_accuracy
结果:
cell 19 对得到的w对应的class进行可视化:
- # Visualize the learned weights for each class.
- # Depending on your choice of learning rate and regularization strength, these may
- # or may not be nice to look at.
- w = best_svm.W[:-1,:] # strip out the bias
- w = w.reshape(32, 32, 3, 10)
- w_min, w_max = np.min(w), np.max(w)
- classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
- for i in xrange(10):
- plt.subplot(2, 5, i + 1)
- # Rescale the weights to be between 0 and 255
- wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)
- plt.imshow(wimg.astype('uint8'))
- plt.axis('off')
- plt.title(classes[i])
结果:
问题:
注:在softmax 与svm的超参数选择时,使用了共同的类,以及类中不同的相应的方法。具体的文件内容与注释如下。
附:通关CS231n企鹅群:578975100 validation:DL-CS231n
CS231n 2016 通关 第三章-SVM 作业分析的更多相关文章
- CS231n 2016 通关 第三章-Softmax 作业
在完成SVM作业的基础上,Softmax的作业相对比较轻松. 完成本作业需要熟悉与掌握的知识: cell 1 设置绘图默认参数 mport random import numpy as np from ...
- CS231n 2016 通关 第三章-SVM与Softmax
1===本节课对应视频内容的第三讲,对应PPT是Lecture3 2===本节课的收获 ===熟悉SVM及其多分类问题 ===熟悉softmax分类问题 ===了解优化思想 由上节课即KNN的分析步骤 ...
- CS231n 2016 通关 第四章-NN 作业
cell 1 显示设置初始化 # A bit of setup import numpy as np import matplotlib.pyplot as plt from cs231n.class ...
- CS231n 2016 通关 第五章 Training NN Part1
在上一次总结中,总结了NN的基本结构. 接下来的几次课,对一些具体细节进行讲解. 比如激活函数.参数初始化.参数更新等等. ====================================== ...
- CS231n 2016 通关 第六章 Training NN Part2
本章节讲解 参数更新 dropout ================================================================================= ...
- CS231n 2016 通关 第四章-反向传播与神经网络(第一部分)
在上次的分享中,介绍了模型建立与使用梯度下降法优化参数.梯度校验,以及一些超参数的经验. 本节课的主要内容: 1==链式法则 2==深度学习框架中链式法则 3==全连接神经网络 =========== ...
- CSAPP深入理解计算机系统(第二版)第三章家庭作业答案
<深入理解计算机系统(第二版)>CSAPP 第三章 家庭作业 这一章介绍了AT&T的汇编指令 比较重要 本人完成了<深入理解计算机系统(第二版)>(以下简称CSAPP) ...
- C++第三章课后作业答案及解析---指针的使用
今天继续完成上周没有完成的习题---C++第三章课后作业,本章题涉及指针的使用,有指向对象的指针做函数参数,对象的引用以及友元类的使用方法等 它们具体的使用方法在下面的题目中会有具体的解析(解析标注在 ...
- Hand on Machine Learning第三章课后作业(1):垃圾邮件分类
import os import email import email.policy 1. 读取邮件数据 SPAM_PATH = os.path.join( "E:\\3.Study\\机器 ...
随机推荐
- sum-root-to-leaf-numbers——dfs
Given a binary tree containing digits from0-9only, each root-to-leaf path could represent a number. ...
- mac os PHP 访问MSSQL
写在前: 项目的数据库是sql server,但是自己的系统是mac os.这样导致了需要一个烦人的系统环境搭建过程.目前要在mac 上的php环境中支持mssql环境访问,经过自己了解,有两种方式: ...
- HttpUtility.UrlEncode,Request.RawUrl,HttpUtility.UrlDecode,HttpUtility.UrlPathEncode,Uri.EscapeDataString
碰到同样问题, 记录一下. 引自:https://www.cnblogs.com/ken-admin/p/5826480.html HttpUtility.UrlDecode(url),从Encode ...
- IDEA搭建Android wear开发环境,Android wear,I'm comming!
随着google公布了android wear这个东西.然后又有了三星的gear,LG的G watch以及moto 360,苹果由公布了apple watch.未来可能在智能手表行业又有一场战争. 当 ...
- Deployment相对ReplicaSet优势
系列目录 RS与Deployment主要用于替代RC.RS的全称为Replica Set.相对于RC,RS与Deployment的优势如下: RC只支持基于等式的selector,如env=dev或者 ...
- Linux下Tun/Tap设备通信原理
Tun/Tap都是虚拟网卡,没有直接映射到物理网卡,是一种纯软件的实现.Tun是三层虚拟设备,能够处理三层即IP包,Tap是二层设备,能处理链路层网络包如以太网包.使用虚拟网络设备,可以实现隧道,如O ...
- IOS UIScrollView滚动到指定位置
[mScrollView setContentOffset:CGPointMake(0,200) animated:YES];
- 基于chyh1990/caffe-compact在windows vs2013上编译caffe步骤
1. 从https://github.com/chyh1990/caffe-compact下载caffe-compact代码: 2. 通过CMake(cmake-gui)生成vs2 ...
- iOS UI13_数据解析XML_,JSON
- (IBAction)parserButton:(id)sender { parserXML *parser =[[parserXML alloc] init]; [parser startPars ...
- EasyDarwin开源流媒体服务器内存管理优化
-本文由EasyDarwin开源团队成员Fantasy贡献 前言 最近在linux上跑EasyDarwin发现一个很奇怪的问题,当有RTSPSession连接上来的时候,发现进程的虚拟内存映射一下就多 ...