第二个作业难度很高,但做(抄)完之后收获还是很大的....

一、Fully-Connected Neural Nets

首先是对之前的神经网络的程序进行重构,目的是可以构建任意大小的全连接的neural network,这里用模块化的思想构建整个代码,具体思路如下:

  1. #前向传播
  2. def layer_forward(x, w):
  3. """ Receive inputs x and weights w """
  4. # 做前向计算
  5. z = # 需要存储的中间值,便于BP的时候使用
  6. # Do some more computations ...
  7. out = # the output
  8.  
  9. cache = (x, w, z, out) # Values we need to compute gradients
  10.  
  11. return out, cache
  12. #后向传播
  13. def layer_backward(dout, cache):
  14. """
  15. Receive derivative of loss with respect to outputs and cache,
  16. and compute derivative with respect to inputs.
  17. """
  18. # Unpack cache values
  19. x, w, z, out = cache
  20.  
  21. # Use values in cache to compute derivatives
  22. dx = # Derivative of loss with respect to x
  23. dw = # Derivative of loss with respect to w
  24.  
  25. return dx, dw

在上面的思想指导下,要求实现下面的代码:

  1. def affine_forward(x, w, b):
  2. """
  3.   X的shape是(N,d_1,d_2,...d_k),第一维带便minibatch的数目,后面是把图片的shape,所以进来的时候把后面全面转为
      一维的向量
  4.  
  5. Inputs:
  6. - x: A numpy array containing input data, of shape (N, d_1, ..., d_k)
  7. - w: A numpy array of weights, of shape (D, M)
  8. - b: A numpy array of biases, of shape (M,)
  9.  
  10. Returns a tuple of:
  11. - out: output, of shape (N, M)
  12. - cache: (x, w, b)
  13. """
  14. out = None
  15. N=x.shape[0]
  16. x_new=x.reshape(N,-1)#转为二维向量
  17. out=np.dot(x_new,w)+b
  18. cache = (x, w, b) # 不需要保存out
  19. return out, cache
  20.  
  21. def affine_backward(dout, cache):
  22.  
  23. x, w, b = cache
  24. dx, dw, db = None, None, None
  25. dx=np.dot(dout,w.T)
  26. dx=np.reshape(dx,x.shape)
  27. x_new=x.reshape(x.shape[0],-1)
  28. dw=np.dot(x_new.T,dout)
  29. db=np.sum(dout,axis=0,keepdims=True)
  30.  
  31. return dx, dw, db
  32.  
  33. def relu_forward(x):
  34. """
  35. Computes the forward pass for a layer of rectified linear units (ReLUs).
  36.  
  37. Input:
  38. - x: Inputs, of any shape
  39.  
  40. Returns a tuple of:
  41. - out: Output, of the same shape as x
  42. - cache: x
  43. """
  44. out = None
  45. out=np.maximum(0,x)
  46. cache = x
  47. return out, cache
  48.  
  49. def relu_backward(dout, cache):
  50.  
  51. dx, x = None, cache
  52. #############################################################################
  53. # TODO: Implement the ReLU backward pass. #
  54. #############################################################################
  55. dx=dout
  56. dx[x<=0]=0
  57. #############################################################################
  58. # END OF YOUR CODE #
  59. #############################################################################
  60. return dx

上面值得商讨的就是为什么求db的公式是db=np.sum(dout,axis=0,keepdims=True),在我看来是少了一个平均的操作的,个人感觉还是因为db的作用小,所以这里用sum的话会方便...grandient check的代码不需要专门为它进行改变。

完成上面两个基本的layer,就可以构建一个Sandwich的层了,因为fc-relu的使用还是比较常见的,所以这里直接构建了出来:

  1. def affine_relu_forward(x, w, b):
  2. """
  3. Convenience layer that perorms an affine transform followed by a ReLU
  4.  
  5. Inputs:
  6. - x: Input to the affine layer
  7. - w, b: Weights for the affine layer
  8.  
  9. Returns a tuple of:
  10. - out: Output from the ReLU
  11. - cache: Object to give to the backward pass
  12. """
  13. a, fc_cache = affine_forward(x, w, b)
  14. out, relu_cache = relu_forward(a)
  15. cache = (fc_cache, relu_cache)
  16. return out, cache
  17.  
  18. def affine_relu_backward(dout, cache):
  19. """
  20. Backward pass for the affine-relu convenience layer
  21. """
  22. fc_cache, relu_cache = cache
  23. da = relu_backward(dout, relu_cache)
  24. dx, dw, db = affine_backward(da, fc_cache)
  25. return dx, dw, db

后面有一个构建上层layer的网络,我不准备说了,直接聊一聊一个迄今为止最厉害的类FullyConnectecNEt吧,先上代码和注释:

  1. 1 class FullyConnectedNet(object):
    2 """
  2. A fully-connected neural network with an arbitrary number of hidden layers,
  3. ReLU nonlinearities, and a softmax loss function. This will also implement
  4. dropout and batch normalization as options. For a network with L layers,
  5. the architecture will be
  6. {affine - [batch norm] - relu - [dropout]} x (L - 1) - affine - softmax
  7. where batch normalization and dropout are optional, and the {...} block is
  8. repeated L - 1 times.
  9. Similar to the TwoLayerNet above, learnable parameters are stored in the
  10. self.params dictionary and will be learned using the Solver class.
  11. """
  12. def __init__(self, hidden_dims, input_dim=3*32*32, num_classes=10,
  13. dropout=0, use_batchnorm=False, reg=0.0,
  14. weight_scale=1e-2, dtype=np.float32, seed=None):
  15. """
  16. Initialize a new FullyConnectedNet.
  17. Inputs:
  18. - hidden_dims: A list of integers giving the size of each hidden layer.
  19. - input_dim: An integer giving the size of the input.
  20. - num_classes: An integer giving the number of classes to classify.
  21. - dropout: Scalar between 0 and 1 giving dropout strength. If dropout=0 then
  22. the network should not use dropout at all.
  23. - use_batchnorm: Whether or not the network should use batch normalization.
  24. - reg: Scalar giving L2 regularization strength.
  25. - weight_scale: Scalar giving the standard deviation for random
  26. initialization of the weights.
  27. - dtype: A numpy datatype object; all computations will be performed using
  28. this datatype. float32 is faster but less accurate, so you should use
  29. float64 for numeric gradient checking.
  30. - seed: If not None, then pass this random seed to the dropout layers. This
  31. will make the dropout layers deteriminstic so we can gradient check the
  32. model.
  33. """
  34. self.use_batchnorm = use_batchnorm
  35. self.use_dropout = dropout > 0
  36. self.reg = reg
  37. self.num_layers = 1 + len(hidden_dims)
  38. self.dtype = dtype
  39. self.params = {}
  40. ############################################################################
  41. # TODO: Initialize the parameters of the network, storing all values in #
  42. # the self.params dictionary. Store weights and biases for the first layer #
  43. # in W1 and b1; for the second layer use W2 and b2, etc. Weights should be #
  44. # initialized from a normal distribution with standard deviation equal to #
  45. # weight_scale and biases should be initialized to zero. #
  46. # #
  47. # When using batch normalization, store scale and shift parameters for the #
  48. # first layer in gamma1 and beta1; for the second layer use gamma2 and #
  49. # beta2, etc. Scale parameters should be initialized to one and shift #
  50. # parameters should be initialized to zero. #
  51. ############################################################################
  52. layers_dims = [input_dim] + hidden_dims + [num_classes] #z这里存储的是每个layer的大小,因为中间的是list,所以要把前后连个加上list来做
  53. for i in xrange(self.num_layers):
  54. self.params['W' + str(i + 1)] = weight_scale * np.random.randn(layers_dims[i], layers_dims[i + 1])
  55. self.params['b' + str(i + 1)] = np.zeros((1, layers_dims[i + 1]))
  56. if self.use_batchnorm and i < len(hidden_dims):#最后一层是不需要batchnorm
  57. self.params['gamma' + str(i + 1)] = np.ones((1, layers_dims[i + 1]))
  58. self.params['beta' + str(i + 1)] = np.zeros((1, layers_dims[i + 1]))
  59. ############################################################################
  60. # END OF YOUR CODE #
  61. ############################################################################
  62. # When using dropout we need to pass a dropout_param dictionary to each
  63. # dropout layer so that the layer knows the dropout probability and the mode
  64. # (train / test). You can pass the same dropout_param to each dropout layer.
  65. self.dropout_param = {}
  66. if self.use_dropout:
  67. self.dropout_param = {'mode': 'train', 'p': dropout}
  68. if seed is not None:
  69. self.dropout_param['seed'] = seed
  70. # With batch normalization we need to keep track of running means and
  71. # variances, so we need to pass a special bn_param object to each batch
  72. # normalization layer. You should pass self.bn_params[0] to the forward pass
  73. # of the first batch normalization layer, self.bn_params[1] to the forward
  74. # pass of the second batch normalization layer, etc.
  75. self.bn_params = []
  76. if self.use_batchnorm:
  77. self.bn_params = [{'mode': 'train'} for i in xrange(self.num_layers - 1)]
  78. # Cast all parameters to the correct datatype
  79. for k, v in self.params.iteritems():
  80. self.params[k] = v.astype(dtype)
  81. def loss(self, X, y=None):
  82. """
  83. Compute loss and gradient for the fully-connected net.
  84. Input / output: Same as TwoLayerNet above.
  85. """
  86. X = X.astype(self.dtype)
  87. mode = 'test' if y is None else 'train'
  88. # Set train/test mode for batchnorm params and dropout param since they
  89. # behave differently during training and testing.
  90. if self.dropout_param is not None:
  91. self.dropout_param['mode'] = mode
  92. if self.use_batchnorm:
  93. for bn_param in self.bn_params:
  94. bn_param[mode] = mode
  95. scores = None
  96. ############################################################################
  97. # TODO: Implement the forward pass for the fully-connected net, computing #
  98. # the class scores for X and storing them in the scores variable. #
  99. # #
  100. # When using dropout, you'll need to pass self.dropout_param to each #
  101. # dropout forward pass. #
  102. # #
  103. # When using batch normalization, you'll need to pass self.bn_params[0] to #
  104. # the forward pass for the first batch normalization layer, pass #
  105. # self.bn_params[1] to the forward pass for the second batch normalization #
  106. # layer, etc. #
  107. ############################################################################
  108. h, cache1, cache2, cache3,cache4, bn, out = {}, {}, {}, {}, {}, {},{}
  109. out[0] = X #存储每一层的out,按照逻辑,X就是out0[0]
  110. # Forward pass: compute loss
  111. for i in xrange(self.num_layers - 1):
  112. # 得到每一层的参数
  113. w, b = self.params['W' + str(i + 1)], self.params['b' + str(i + 1)]
  114. if self.use_batchnorm:
  115. gamma, beta = self.params['gamma' + str(i + 1)], self.params['beta' + str(i + 1)]
  116. h[i], cache1[i] = affine_forward(out[i], w, b)
  117. bn[i], cache2[i] = batchnorm_forward(h[i], gamma, beta, self.bn_params[i])
  118. out[i + 1], cache3[i] = relu_forward(bn[i])
  119. if self.use_dropout:
  120. out[i+1], cache4[i] = dropout_forward(out[i+1] , self.dropout_param)
  121. else:
  122. out[i + 1], cache3[i] = affine_relu_forward(out[i], w, b)
  123. if self.use_dropout:
  124. out[i + 1], cache4[i] = dropout_forward(out[i + 1], self.dropout_param)
  125. W, b = self.params['W' + str(self.num_layers)], self.params['b' + str(self.num_layers)]
  126. scores, cache = affine_forward(out[self.num_layers - 1], W, b) #对最后一层进行计算
  127. ############################################################################
  128. # END OF YOUR CODE #
  129. ############################################################################
  130. # If test mode return early
  131. if mode == 'test':
  132. return scores
  133. loss, grads = 0.0, {}
  134. ############################################################################
  135. # TODO: Implement the backward pass for the fully-connected net. Store the #
  136. # loss in the loss variable and gradients in the grads dictionary. Compute #
  137. # data loss using softmax, and make sure that grads[k] holds the gradients #
  138. # for self.params[k]. Don't forget to add L2 regularization! #
  139. # #
  140. # When using batch normalization, you don't need to regularize the scale #
  141. # and shift parameters. #
  142. # #
  143. # NOTE: To ensure that your implementation matches ours and you pass the #
  144. # automated tests, make sure that your L2 regularization includes a factor #
  145. # of 0.5 to simplify the expression for the gradient. #
  146. ############################################################################
  147. data_loss, dscores = softmax_loss(scores, y)
  148. reg_loss = 0
  149. for i in xrange(self.num_layers):
  150. reg_loss += 0.5 * self.reg * np.sum(self.params['W' + str(i + 1)] * self.params['W' + str(i + 1)])
  151. loss = data_loss + reg_loss
  152. # Backward pass: compute gradients
  153. dout, dbn, dh, ddrop = {}, {}, {}, {}
  154. t = self.num_layers - 1
  155. dout[t], grads['W' + str(t + 1)], grads['b' + str(t + 1)] = affine_backward(dscores, cache)#这个cache就是上面得到的 for i in xrange(t):
  156. if self.use_batchnorm:
  157. if self.use_dropout:
  158. dout[t - i] = dropout_backward(dout[t-i], cache4[t-1-i])
  159. dbn[t - 1 - i] = relu_backward(dout[t - i], cache3[t - 1 - i])
  160. dh[t - 1 - i], grads['gamma' + str(t - i)], grads['beta' + str(t - i)] = batchnorm_backward(dbn[t - 1 - i],
  161. cache2[
  162. t - 1 - i])
  163. dout[t - 1 - i], grads['W' + str(t - i)], grads['b' + str(t - i)] = affine_backward(dh[t - 1 - i],
  164. cache1[t - 1 - i])
  165. else:
  166. if self.use_dropout:
  167. dout[t - i] = dropout_backward(dout[t - i], cache4[t - 1 - i])
  168. dout[t - 1 - i], grads['W' + str(t - i)], grads['b' + str(t - i)] = affine_relu_backward(dout[t - i],
  169. cache3[t - 1 - i])
  170. # Add the regularization gradient contribution
  171. for i in xrange(self.num_layers):
  172. grads['W' + str(i + 1)] += self.reg * self.params['W' + str(i + 1)]
  173. ############################################################################
  174. # END OF YOUR CODE #
  175. ############################################################################
  176. return loss, grads

     上面的代码因为是上层代码,不需要关心具体的Bp如何实现(因为之前已经实现了),所以还是很好看懂的,但到现在还是没有结束的,我们还要使用slover来对

    神经网络进优化求解。

  1. import numpy as np
  2.  
  3. from cs231n import optim
  4.  
  5. class Solver(object):
  6. """
  7. A Solver encapsulates all the logic necessary for training classification
  8. models. The Solver performs stochastic gradient descent using different
  9. update rules defined in optim.py.
  10.  
  11. The solver accepts both training and validataion data and labels so it can
  12. periodically check classification accuracy on both training and validation
  13. data to watch out for overfitting.
  14.  
  15. To train a model, you will first construct a Solver instance, passing the
  16. model, dataset, and various optoins (learning rate, batch size, etc) to the
  17. constructor. You will then call the train() method to run the optimization
  18. procedure and train the model.
  19.  
  20. After the train() method returns, model.params will contain the parameters
  21. that performed best on the validation set over the course of training.
  22. In addition, the instance variable solver.loss_history will contain a list
  23. of all losses encountered during training and the instance variables
  24. solver.train_acc_history and solver.val_acc_history will be lists containing
  25. the accuracies of the model on the training and validation set at each epoch.
  26.  
  27. Example usage might look something like this:
  28.  
  29. data = {
  30. 'X_train': # training data
  31. 'y_train': # training labels
  32. 'X_val': # validation data
  33. 'X_train': # validation labels
  34. }
  35. model = MyAwesomeModel(hidden_size=100, reg=10)
  36. solver = Solver(model, data,
  37. update_rule='sgd',
  38. optim_config={
  39. 'learning_rate': 1e-3,
  40. },
  41. lr_decay=0.95,
  42. num_epochs=10, batch_size=100,
  43. print_every=100)
  44. solver.train()
  45.  
  46. A Solver works on a model object that must conform to the following API:
  47.  
  48. - model.params must be a dictionary mapping string parameter names to numpy
  49. arrays containing parameter values.
  50.  
  51. - model.loss(X, y) must be a function that computes training-time loss and
  52. gradients, and test-time classification scores, with the following inputs
  53. and outputs:
  54.  
  55. Inputs:
  56. - X: Array giving a minibatch of input data of shape (N, d_1, ..., d_k)
  57. - y: Array of labels, of shape (N,) giving labels for X where y[i] is the
  58. label for X[i].
  59.  
  60. Returns:
  61. If y is None, run a test-time forward pass and return:
  62. - scores: Array of shape (N, C) giving classification scores for X where
  63. scores[i, c] gives the score of class c for X[i].
  64.  
  65. If y is not None, run a training time forward and backward pass and return
  66. a tuple of:
  67. - loss: Scalar giving the loss
  68. - grads: Dictionary with the same keys as self.params mapping parameter
  69. names to gradients of the loss with respect to those parameters.
  70. """
  71.  
  72. def __init__(self, model, data, **kwargs):
  73. """
  74. Construct a new Solver instance.
  75.  
  76. Required arguments:
  77. - model: A model object conforming to the API described above
  78. - data: A dictionary of training and validation data with the following:
  79. 'X_train': Array of shape (N_train, d_1, ..., d_k) giving training images
  80. 'X_val': Array of shape (N_val, d_1, ..., d_k) giving validation images
  81. 'y_train': Array of shape (N_train,) giving labels for training images
  82. 'y_val': Array of shape (N_val,) giving labels for validation images
  83.  
  84. Optional arguments:
  85. - update_rule: A string giving the name of an update rule in optim.py.
  86. Default is 'sgd'.
  87. - optim_config: A dictionary containing hyperparameters that will be
  88. passed to the chosen update rule. Each update rule requires different
  89. hyperparameters (see optim.py) but all update rules require a
  90. 'learning_rate' parameter so that should always be present.
  91. - lr_decay: A scalar for learning rate decay; after each epoch the learning
  92. rate is multiplied by this value.
  93. - batch_size: Size of minibatches used to compute loss and gradient during
  94. training.
  95. - num_epochs: The number of epochs to run for during training.
  96. - print_every: Integer; training losses will be printed every print_every
  97. iterations.
  98. - verbose: Boolean; if set to false then no output will be printed during
  99. training.
  100. """
  101. self.model = model
  102. self.X_train = data['X_train']
  103. self.y_train = data['y_train']
  104. self.X_val = data['X_val']
  105. self.y_val = data['y_val']
  106.  
  107. # Unpack keyword arguments
  108. self.update_rule = kwargs.pop('update_rule', 'sgd')
  109. self.optim_config = kwargs.pop('optim_config', {})
  110. self.lr_decay = kwargs.pop('lr_decay', 1.0)
  111. self.batch_size = kwargs.pop('batch_size', 100)
  112. self.num_epochs = kwargs.pop('num_epochs', 10)
  113.  
  114. self.print_every = kwargs.pop('print_every', 10)
  115. self.verbose = kwargs.pop('verbose', True)
  116.  
  117. # Throw an error if there are extra keyword arguments
  118. if len(kwargs) > 0:
  119. extra = ', '.join('"%s"' % k for k in kwargs.keys())
  120. raise ValueError('Unrecognized arguments %s' % extra)
  121.  
  122. # Make sure the update rule exists, then replace the string
  123. # name with the actual function
  124. if not hasattr(optim, self.update_rule):
  125. raise ValueError('Invalid update_rule "%s"' % self.update_rule)
  126. self.update_rule = getattr(optim, self.update_rule)
  127.  
  128. self._reset()
  129.  
  130. def _reset(self):
  131. """
  132. Set up some book-keeping variables for optimization. Don't call this
  133. manually.
  134. """
  135. # Set up some variables for book-keeping
  136. self.epoch = 0
  137. self.best_val_acc = 0
  138. self.best_params = {}
  139. self.loss_history = []
  140. self.train_acc_history = []
  141. self.val_acc_history = []
  142.  
  143. # Make a deep copy of the optim_config for each parameter
  144. self.optim_configs = {}
  145. for p in self.model.params:
  146. d = {k: v for k, v in self.optim_config.iteritems()}
  147. self.optim_configs[p] = d
  148.  
  149. def _step(self):
  150. """
  151. Make a single gradient update. This is called by train() and should not
  152. be called manually.
  153. """
  154. # Make a minibatch of training data
  155. num_train = self.X_train.shape[0]
  156. batch_mask = np.random.choice(num_train, self.batch_size)
  157. X_batch = self.X_train[batch_mask]
  158. y_batch = self.y_train[batch_mask]
  159.  
  160. # Compute loss and gradient
  161. loss, grads = self.model.loss(X_batch, y_batch)
  162. self.loss_history.append(loss)
  163.  
  164. # Perform a parameter update
  165. for p, w in self.model.params.iteritems():
  166. dw = grads[p]
  167. config = self.optim_configs[p]
  168. next_w, next_config = self.update_rule(w, dw, config) #因为有很多update的方法
  169. self.model.params[p] = next_w
  170. self.optim_configs[p] = next_config
  171.  
  172. def check_accuracy(self, X, y, num_samples=None, batch_size=100):
  173. """
  174. Check accuracy of the model on the provided data.
  175.  
  176. Inputs:
  177. - X: Array of data, of shape (N, d_1, ..., d_k)
  178. - y: Array of labels, of shape (N,)
  179. - num_samples: If not None, subsample the data and only test the model
  180. on num_samples datapoints.
  181. - batch_size: Split X and y into batches of this size to avoid using too
  182. much memory.
  183.  
  184. Returns:
  185. - acc: Scalar giving the fraction of instances that were correctly
  186. classified by the model.
  187. """
  188.  
  189. # Maybe subsample the data
  190. N = X.shape[0]
  191. if num_samples is not None and N > num_samples:
  192. mask = np.random.choice(N, num_samples)
  193. N = num_samples
  194. X = X[mask]
  195. y = y[mask]
  196.  
  197. # Compute predictions in batches
  198. num_batches = N / batch_size
  199. if N % batch_size != 0:
  200. num_batches += 1
  201. y_pred = []
  202. for i in xrange(num_batches):
  203. start = i * batch_size
  204. end = (i + 1) * batch_size
  205. scores = self.model.loss(X[start:end])
  206. y_pred.append(np.argmax(scores, axis=1))
  207. y_pred = np.hstack(y_pred)
  208. acc = np.mean(y_pred == y)
  209.  
  210. return acc
  211.  
  212. def train(self):
  213. """
  214. Run optimization to train the model.
  215. """
  216. num_train = self.X_train.shape[0]
  217. iterations_per_epoch = max(num_train / self.batch_size, 1)
  218. num_iterations = self.num_epochs * iterations_per_epoch
  219.  
  220. for t in xrange(num_iterations):
  221. self._step()
  222.  
  223. # Maybe print training loss
  224. if self.verbose and t % self.print_every == 0:
  225. print '(Iteration %d / %d) loss: %f' % (
  226. t + 1, num_iterations, self.loss_history[-1])
  227.  
  228. # At the end of every epoch, increment the epoch counter and decay the
  229. # learning rate.
  230. epoch_end = (t + 1) % iterations_per_epoch == 0
  231. if epoch_end:
  232. self.epoch += 1
  233. for k in self.optim_configs:
  234. self.optim_configs[k]['learning_rate'] *= self.lr_decay
  235.  
  236. # Check train and val accuracy on the first iteration, the last
  237. # iteration, and at the end of each epoch.
  238. first_it = (t == 0)
  239. last_it = (t == num_iterations + 1)
  240. if first_it or last_it or epoch_end:
  241. train_acc = self.check_accuracy(self.X_train, self.y_train,
  242. num_samples=1000)
  243. val_acc = self.check_accuracy(self.X_val, self.y_val)
  244. self.train_acc_history.append(train_acc)
  245. self.val_acc_history.append(val_acc)
  246.  
  247. if self.verbose:
  248. print '(Epoch %d / %d) train acc: %f; val_acc: %f' % (
  249. self.epoch, self.num_epochs, train_acc, val_acc)
  250.  
  251. # Keep track of the best model
  252. if val_acc > self.best_val_acc:
  253. self.best_val_acc = val_acc
  254. self.best_params = {}
  255. for k, v in self.model.params.iteritems():
  256. self.best_params[k] = v.copy()
  257.  
  258. # At the end of training swap the best params into the model
  259. self.model.params = self.best_params

   至此可以说构建了一个deep learning全连接网络的框架,我们可以来回顾一下具体做了些事:

    1.编写全连接层,Relu层的前向传播和反向传播算法。

    2.编写Sandwich的函数,只是将上面的集成起来而已。

    3.编一个FUllyconnect的类,功能是:传入neural network相应的参数,得到一个对应的model。

    4.编写一个solver的类,功能是:传入model和图片,进行最后的最优求解。

   有哪些问题呢:

    1.前向传播的时候需要保存一些参数,这里直接返回cache和out。

    2.编写多层的时候需要注意很多点,各层的参数,注意,对于i层,它的输入时out[i],输出是out[i+1],参数信息是cache[i]。

3.SGD的update的rule毕竟还是太navie了,之后可以尝试一下别的。

  写好了上面的代码,然后呢?

  这里有一些很有用的trick需要记住的。

  当你构建了一个neural network准备去跑你的数据集的时候,你肯定不能一次就去跑那个最大最原始的,最好的方法是先去overfitting一个小数据集,证明你的网络是有不错的学习能力的,这个时候就要大胆调参了...个人建议LR小一点,迭代次数多一点,scale也要看情况。

总结:第二次作业的内容很多,这次先说这么多了,未完待续。

笔记:CS231n+assignment2(作业二)(一)的更多相关文章

  1. 【hadoop代码笔记】hadoop作业提交之汇总

    一.概述 在本篇博文中,试图通过代码了解hadoop job执行的整个流程.即用户提交的mapreduce的jar文件.输入提交到hadoop的集群,并在集群中运行.重点在代码的角度描述整个流程,有些 ...

  2. ufldl学习笔记与编程作业:Logistic Regression(逻辑回归)

    ufldl学习笔记与编程作业:Logistic Regression(逻辑回归) ufldl出了新教程,感觉比之前的好,从基础讲起.系统清晰,又有编程实践. 在deep learning高质量群里面听 ...

  3. Java菜鸟学习笔记--数组篇(三):二维数组

    定义 //1.二维数组的定义 //2.二维数组的内存空间 //3.不规则数组 package me.array; public class Array2Demo{ public static void ...

  4. Python学习之编写三级菜单(Day1,作业二)

    作业二:多级菜单 三级菜单 可依次进入各子菜单 在各级菜单中输入B返回上一级Q退出程序 知识点:字典的操作,while循环,for循环,if判断 思路: 1.开始,打印一级菜单让用户进行选择(可以输入 ...

  5. ufldl学习笔记和编程作业:Feature Extraction Using Convolution,Pooling(卷积和汇集特征提取)

    ufldl学习笔记与编程作业:Feature Extraction Using Convolution,Pooling(卷积和池化抽取特征) ufldl出了新教程,感觉比之前的好,从基础讲起.系统清晰 ...

  6. ufldl学习笔记和编程作业:Softmax Regression(softmax回报)

    ufldl学习笔记与编程作业:Softmax Regression(softmax回归) ufldl出了新教程.感觉比之前的好,从基础讲起.系统清晰,又有编程实践. 在deep learning高质量 ...

  7. JavaScript学习笔记之数组(二)

    JavaScript学习笔记之数组(二) 1.['1','2','3'].map(parseInt) 输出什么,为什么? ['1','2','3'].map(parseInt)//[1,NaN,NaN ...

  8. vue2.0学习笔记之路由(二)路由嵌套+动画

    <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...

  9. vue2.0学习笔记之路由(二)路由嵌套

    <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8&quo ...

  10. day1作业二:多级菜单操作

    作业二:多级菜单 (1)三级菜单 (2)可以次选择进入各子菜单 (3)所需新知识点:列表.字典 要求:输入back返回上一层,输入quit退出整个程序 思路: (1)首先定义好三级菜单字典: (2)提 ...

随机推荐

  1. solr 近实时搜索

    摘要: Solr的近实时搜索NRT(Near Real Time Searching)意味着文档可以在索引以后马上可以被查询到. Solr不会因为本次提交而阻塞更新操作,不会等待后台合并操作(merg ...

  2. SAP系统管理中常见问题解答(转载)

    1.如何查看SAP系统的位数? system——status看 Platform ID Platform 32-bit 64-bit --------------------------------- ...

  3. 一、MySQL 安装

    MySQL 安装 所有平台的 MySQL 下载地址为: MySQL 下载 . 挑选你需要的 MySQL Community Server 版本及对应的平台. 注意:安装过程我们需要通过开启管理员权限来 ...

  4. python列表中的赋值与深浅拷贝

    首先创建一个列表 a=[[1,2,3],4,5,6] 一.赋值 a=[[1,2,3],4,5,6]b=aa[0][1]='tom'print(a)print(b)结果: [[1, 'tom', 3], ...

  5. 748. Shortest Completing Word

    https://leetcode.com/problems/shortest-completing-word/description/ class Solution { public: string ...

  6. Green Space【绿色空间】

    Green Space Living in an urban area with green spaces has a long-lasting positive impact on people's ...

  7. makefile学习(1)

    GNU Make / Makefile 学习资料 GNU Make学习总结(一) GNU Make学习总结(二) 这篇学习总结,从一个简单的小例子开始,逐步加深,来讲解Makefile的用法. 最后用 ...

  8. Windows下安装配置SQLite和使用的教程

    什么是SQLite SQLite是一款非常轻量级的关系数据库系统,支持多数SQL92标准.SQLite在使用前不需要安装设置,不需要进程来启动.停止或配置,而其他大多数SQL数据库引擎是作为一个单独的 ...

  9. 自定义View/ViewGroup的步骤和实现

    1.设置属性(供XML调用) 在res目录新建attrs.xml文件 <?xml version="1.0" encoding="utf-8"?> ...

  10. ipv4配置