2分类1隐层nn, 作业默认设置:

  • 1个输出单元, sigmoid激活函数. (因为二分类);
  • 4个隐层单元, tanh激活函数. (除作为输出单元且为二分类任务外, 几乎不选用 sigmoid 做激活函数);
  • n_x个输入单元, n_x为训练数据维度;

总的来说共三层: 输入层(n_x = X.shape[0]), 隐层(n_h = 4), 输出层(n_y = 1).

import 和预设置

  1. # Package imports
  2. import numpy as np
  3. import matplotlib.pyplot as plt
  4. from testCases import *
  5. import sklearn
  6. import sklearn.datasets
  7. import sklearn.linear_model
  8. from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset, load_extra_datasets
  9. %matplotlib inline
  10. np.random.seed(1) # set a seed so that the results are consistent

4 - Neural Network model

Here is our model:

Mathematically:

For one example \(x^{(i)}\):

\[z^{[1] (i)} = W^{[1]} x^{(i)} + b^{[1] (i)}\tag{1}
\]

\[a^{[1] (i)} = \tanh(z^{[1] (i)})\tag{2}
\]

\[z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2] (i)}\tag{3}
\]

\[\hat{y}^{(i)} = a^{[2] (i)} = \sigma(z^{ [2] (i)})\tag{4}
\]

\[y^{(i)}_{prediction} = \begin{cases} 1 & \mbox{if } a^{[2](i)} > 0.5 \\ 0 & \mbox{otherwise } \end{cases}\tag{5}
\]

Given the predictions on all the examples, you can also compute the cost \(J\) as follows:

\[J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large \right) \small \tag{6}
\]

Reminder: The general methodology to build a Neural Network is to:

  1. 1. Define the neural network structure ( # of input units, # of hidden units, etc).
  2. 2. Initialize the model's parameters
  3. 3. Loop:
  4. - Implement forward propagation
  5. - Compute loss
  6. - Implement backward propagation to get the gradients
  7. - Update parameters (gradient descent)

You often build helper functions to compute steps 1-3 and then merge them into one function we call nn_model(). Once you've built nn_model() and learnt the right parameters, you can make predictions on new data.

4.1 - Defining the neural network structure

  1. # GRADED FUNCTION: layer_sizes
  2. def layer_sizes(X, Y):
  3. """
  4. Arguments:
  5. X -- input dataset of shape (input size, number of examples)
  6. Y -- labels of shape (output size, number of examples)
  7. Returns:
  8. n_x -- the size of the input layer
  9. n_h -- the size of the hidden layer
  10. n_y -- the size of the output layer
  11. """
  12. ### START CODE HERE ### (≈ 3 lines of code)
  13. n_x = X.shape[0] # size of input layer
  14. n_h = 4
  15. n_y = Y.shape[0] # size of output layer
  16. ### END CODE HERE ###
  17. return (n_x, n_h, n_y)

4.2 - Initialize the model's parameters

  1. # GRADED FUNCTION: initialize_parameters
  2. def initialize_parameters(n_x, n_h, n_y):
  3. """
  4. Argument:
  5. n_x -- size of the input layer
  6. n_h -- size of the hidden layer
  7. n_y -- size of the output layer
  8. Returns:
  9. params -- python dictionary containing your parameters:
  10. W1 -- weight matrix of shape (n_h, n_x)
  11. b1 -- bias vector of shape (n_h, 1)
  12. W2 -- weight matrix of shape (n_y, n_h)
  13. b2 -- bias vector of shape (n_y, 1)
  14. """
  15. np.random.seed(2) # we set up a seed so that your output matches ours although the initialization is random.
  16. ### START CODE HERE ### (≈ 4 lines of code)
  17. W1 = np.random.randn(n_h, n_x) * 0.01
  18. b1 = np.zeros((n_h, 1))
  19. W2 = np.random.randn(n_y, n_h) * 0.01
  20. b2 = np.zeros((n_y, 1))
  21. ### END CODE HERE ###
  22. assert (W1.shape == (n_h, n_x))
  23. assert (b1.shape == (n_h, 1))
  24. assert (W2.shape == (n_y, n_h))
  25. assert (b2.shape == (n_y, 1))
  26. parameters = {"W1": W1,
  27. "b1": b1,
  28. "W2": W2,
  29. "b2": b2}
  30. return parameters

4.3 - The Loop

注意, 若换激活函数,有两个地方需要改:

  1. forward_propagation()中 A1 = np.tanh(Z1)处;
  2. backward_propagation()中 dZ1中 1 - np.power(A1, 2) 处.

\[tanh'(x)=1-x^2
\]

  1. # GRADED FUNCTION: forward_propagation
  2. def forward_propagation(X, parameters):
  3. """
  4. Argument:
  5. X -- input data of size (n_x, m)
  6. parameters -- python dictionary containing your parameters (output of initialization function)
  7. Returns:
  8. A2 -- The sigmoid output of the second activation
  9. cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
  10. """
  11. # Retrieve each parameter from the dictionary "parameters"
  12. ### START CODE HERE ### (≈ 4 lines of code)
  13. W1 = parameters["W1"]
  14. b1 = parameters["b1"]
  15. W2 = parameters["W2"]
  16. b2 = parameters["b2"]
  17. ### END CODE HERE ###
  18. # Implement Forward Propagation to calculate A2 (probabilities)
  19. ### START CODE HERE ### (≈ 4 lines of code)
  20. Z1 = np.dot(W1, X) + b1
  21. A1 = np.tanh(Z1)
  22. Z2 = np.dot(W2, A1) + b2
  23. A2 = sigmoid(Z2)
  24. ### END CODE HERE ###
  25. assert(A2.shape == (1, X.shape[1]))
  26. cache = {"Z1": Z1,
  27. "A1": A1,
  28. "Z2": Z2,
  29. "A2": A2}
  30. return A2, cache
  1. # GRADED FUNCTION: compute_cost
  2. def compute_cost(A2, Y, parameters):
  3. """
  4. Computes the cross-entropy cost given in equation (13)
  5. Arguments:
  6. A2 -- The sigmoid output of the second activation, of shape (1, number of examples)
  7. Y -- "true" labels vector of shape (1, number of examples)
  8. parameters -- python dictionary containing your parameters W1, b1, W2 and b2
  9. Returns:
  10. cost -- cross-entropy cost given equation (13)
  11. """
  12. m = Y.shape[1] # number of example
  13. # Compute the cross-entropy cost
  14. ### START CODE HERE ### (≈ 2 lines of code)
  15. logprobs = np.multiply(np.log(A2), Y) + np.multiply(np.log(1 - A2), 1 - Y)
  16. cost = -np.sum(logprobs)/m
  17. ### END CODE HERE ###
  18. cost = np.squeeze(cost) # makes sure cost is the dimension we expect.
  19. # E.g., turns [[17]] into 17
  20. assert(isinstance(cost, float))
  21. return cost

反向传播时用到的公式:

  1. # GRADED FUNCTION: backward_propagation
  2. def backward_propagation(parameters, cache, X, Y):
  3. """
  4. Implement the backward propagation using the instructions above.
  5. Arguments:
  6. parameters -- python dictionary containing our parameters
  7. cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
  8. X -- input data of shape (2, number of examples)
  9. Y -- "true" labels vector of shape (1, number of examples)
  10. Returns:
  11. grads -- python dictionary containing your gradients with respect to different parameters
  12. """
  13. m = X.shape[1]
  14. # First, retrieve W1 and W2 from the dictionary "parameters".
  15. ### START CODE HERE ### (≈ 2 lines of code)
  16. W1 = parameters["W1"]
  17. W2 = parameters["W2"]
  18. ### END CODE HERE ###
  19. # Retrieve also A1 and A2 from dictionary "cache".
  20. ### START CODE HERE ### (≈ 2 lines of code)
  21. A1 = cache["A1"]
  22. A2 = cache["A2"]
  23. ### END CODE HERE ###
  24. # Backward propagation: calculate dW1, db1, dW2, db2.
  25. ### START CODE HERE ### (≈ 6 lines of code, corresponding to 6 equations on slide above)
  26. dZ2 = A2 - Y
  27. dW2 = np.dot(dZ2, A1.T)/m
  28. db2 = np.sum(dZ2, axis=1, keepdims=True)/m
  29. # tanh的导数 1-A1^2
  30. # 若换激活函数,有两个地方需要改
  31. # 1. forward_propagation()中 A1 = np.tanh(Z1)处
  32. # 2. 就是这里backward_propagation()中 dZ1中 1 - np.power(A1, 2) 处
  33. dZ1 = np.multiply(np.dot(W2.T, dZ2), 1 - np.power(A1, 2)) # <--
  34. dW1 = np.dot(dZ1, X.T)/m
  35. db1 = np.sum(dZ1, axis=1, keepdims=True)/m
  36. ### END CODE HERE ###
  37. grads = {"dW1": dW1,
  38. "db1": db1,
  39. "dW2": dW2,
  40. "db2": db2}
  41. return grads
  1. # GRADED FUNCTION: update_parameters
  2. def update_parameters(parameters, grads, learning_rate = 1.2):
  3. """
  4. Updates parameters using the gradient descent update rule given above
  5. Arguments:
  6. parameters -- python dictionary containing your parameters
  7. grads -- python dictionary containing your gradients
  8. Returns:
  9. parameters -- python dictionary containing your updated parameters
  10. """
  11. # Retrieve each parameter from the dictionary "parameters"
  12. ### START CODE HERE ### (≈ 4 lines of code)
  13. W1 = parameters["W1"]
  14. b1 = parameters["b1"]
  15. W2 = parameters["W2"]
  16. b2 = parameters["b2"]
  17. ### END CODE HERE ###
  18. # Retrieve each gradient from the dictionary "grads"
  19. ### START CODE HERE ### (≈ 4 lines of code)
  20. dW1 = grads["dW1"]
  21. db1 = grads["db1"]
  22. dW2 = grads["dW2"]
  23. db2 = grads["db2"]
  24. ## END CODE HERE ###
  25. # Update rule for each parameter
  26. ### START CODE HERE ### (≈ 4 lines of code)
  27. W1 -= learning_rate*dW1
  28. b1 -= learning_rate*db1
  29. W2 -= learning_rate*dW2
  30. b2 -= learning_rate*db2
  31. ### END CODE HERE ###
  32. parameters = {"W1": W1,
  33. "b1": b1,
  34. "W2": W2,
  35. "b2": b2}
  36. return parameters

4.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model()

  1. # GRADED FUNCTION: nn_model
  2. def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False):
  3. """
  4. Arguments:
  5. X -- dataset of shape (2, number of examples)
  6. Y -- labels of shape (1, number of examples)
  7. n_h -- size of the hidden layer
  8. num_iterations -- Number of iterations in gradient descent loop
  9. print_cost -- if True, print the cost every 1000 iterations
  10. Returns:
  11. parameters -- parameters learnt by the model. They can then be used to predict.
  12. """
  13. np.random.seed(3)
  14. n_x = layer_sizes(X, Y)[0]
  15. n_y = layer_sizes(X, Y)[2]
  16. # Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters".
  17. ### START CODE HERE ### (≈ 5 lines of code)
  18. parameters = initialize_parameters(n_x, n_h, n_y)
  19. W1 = parameters["W1"]
  20. b1 = parameters["b1"]
  21. W2 = parameters["W2"]
  22. b2 = parameters["b2"]
  23. ### END CODE HERE ###
  24. # Loop (gradient descent)
  25. for i in range(0, num_iterations):
  26. ### START CODE HERE ### (≈ 4 lines of code)
  27. # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
  28. A2, cache = forward_propagation(X, parameters)
  29. # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
  30. cost = compute_cost(A2, Y, parameters)
  31. # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
  32. grads = backward_propagation(parameters, cache, X, Y)
  33. # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
  34. parameters = update_parameters(parameters, grads)
  35. ### END CODE HERE ###
  36. # Print the cost every 1000 iterations
  37. if print_cost and i % 1000 == 0:
  38. print ("Cost after iteration %i: %f" %(i, cost))
  39. return parameters

4.5 Predictions

Reminder: predictions = \(y_{prediction} = \mathbb 1 \text{{activation > 0.5}} = \begin{cases}
1 & \text{if}\ activation > 0.5 \\
0 & \text{otherwise}
\end{cases}\)

As an example, if you would like to set the entries of a matrix X to 0 and 1 based on a threshold you would do: X_new = (X > threshold)

  1. # GRADED FUNCTION: predict
  2. def predict(parameters, X):
  3. """
  4. Using the learned parameters, predicts a class for each example in X
  5. Arguments:
  6. parameters -- python dictionary containing your parameters
  7. X -- input data of size (n_x, m)
  8. Returns
  9. predictions -- vector of predictions of our model (red: 0 / blue: 1)
  10. """
  11. # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
  12. ### START CODE HERE ### (≈ 2 lines of code)
  13. A2, cache = forward_propagation(X, parameters)
  14. predictions = (A2[0] > 0.5) # [ True False True] 而不是 [1 0 1]
  15. ### END CODE HERE ###
  16. return predictions

使用模型

  1. # Build a model with a n_h-dimensional hidden layer
  2. # 模型经训练后,最终得到 parameters = W2, b2, W1, b1
  3. parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)
  4. # Plot the decision boundary
  5. # 新数据 x 和训练好的参数 parameters 送入 predict() 后, 经过一个前向, 得到A2,
  6. # 再经threshold得到预测结果.
  7. # Print accuracy
  8. predictions = predict(parameters, X)
  9. print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')

4.6 - Tuning hidden layer size (optional/ungraded exercise)

  1. plt.figure(figsize=(16, 32))
  2. hidden_layer_sizes = [1, 2, 3, 4, 5, 10, 20]
  3. for i, n_h in enumerate(hidden_layer_sizes):
  4. plt.subplot(5, 2, i+1)
  5. plt.title('Hidden Layer of size %d' % n_h)
  6. parameters = nn_model(X, Y, n_h, num_iterations = 5000)
  7. plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
  8. predictions = predict(parameters, X)
  9. accuracy = float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100)
  10. print ("Accuracy for {} hidden units: {} %".format(n_h, accuracy))

最后顺便把作业里的两个动画(表现学习率设置不合适导致发散,反过来收敛)也弄上来:



吴恩达深度学习第1课第3周编程作业记录(2分类1隐层nn)的更多相关文章

  1. 吴恩达深度学习第4课第3周编程作业 + PIL + Python3 + Anaconda环境 + Ubuntu + 导入PIL报错的解决

    问题描述: 做吴恩达深度学习第4课第3周编程作业时导入PIL包报错. 我的环境: 已经安装了Tensorflow GPU 版本 Python3 Anaconda 解决办法: 安装pillow模块,而不 ...

  2. 吴恩达深度学习第2课第2周编程作业 的坑(Optimization Methods)

    我python2.7, 做吴恩达深度学习第2课第2周编程作业 Optimization Methods 时有2个坑: 第一坑 需将辅助文件 opt_utils.py 的 nitialize_param ...

  3. 吴恩达深度学习第2课第3周编程作业 的坑(Tensorflow+Tutorial)

    可能因为Andrew Ng用的是python3,而我是python2.7的缘故,我发现了坑.如下: 在辅助文件tf_utils.py中的random_mini_batches(X, Y, mini_b ...

  4. 吴恩达深度学习第1课第4周-任意层人工神经网络(Artificial Neural Network,即ANN)(向量化)手写推导过程(我觉得已经很详细了)

    学习了吴恩达老师深度学习工程师第一门课,受益匪浅,尤其是吴老师所用的符号系统,准确且易区分. 遵循吴老师的符号系统,我对任意层神经网络模型进行了详细的推导,形成笔记. 有人说推导任意层MLP很容易,我 ...

  5. 【Deeplearning.ai 】吴恩达深度学习笔记及课后作业目录

    吴恩达深度学习课程的课堂笔记以及课后作业 代码下载:https://github.com/douzujun/Deep-Learning-Coursera 吴恩达推荐笔记:https://mp.weix ...

  6. 吴恩达深度学习 反向传播(Back Propagation)公式推导技巧

    由于之前看的深度学习的知识都比较零散,补一下吴老师的课程希望能对这块有一个比较完整的认识.课程分为5个部分(粗体部分为已经看过的): 神经网络和深度学习 改善深层神经网络:超参数调试.正则化以及优化 ...

  7. 深度学习 吴恩达深度学习课程2第三周 tensorflow实践 参数初始化的影响

    博主 撸的  该节 代码 地址 :https://github.com/LemonTree1994/machine-learning/blob/master/%E5%90%B4%E6%81%A9%E8 ...

  8. cousera 吴恩达 深度学习 第一课 第二周 作业 过拟合的表现

    上图是课上的编程作业运行10000次迭代后,输出每一百次迭代 训练准确度和测试准确度的走势图,可以看到在600代左右测试准确度为最大的,74%左右, 然后掉到70%左右,再掉到68%左右,然后升到70 ...

  9. Coursera 吴恩达 深度学习 学习笔记

    神经网络和深度学习 Week 1-2 神经网络基础 Week 3 浅层神经网络 Week 4 深层神经网络 改善深层神经网络 Week 1 深度学习的实用层面 Week 2 优化算法 Week 3 超 ...

随机推荐

  1. 开发技巧(3-1)Eclipse查找关键字

    1.选择资源目录->选择search-file菜单 2.在弹出的对话框中, 输入要[搜索的字符串],选择[selected resources],点击[search]按钮 3.搜索结果

  2. 一个适用于单页应用,返回原始滚动条位置的demo

    如题,最近做一个项目时,由于页面太长,跳转后在返回又回到初始位置,不利于用户体验,需要每次返回到用户离开该页面是的位置.由于是移动端项目,使用了移动端的套ui框架framework7,本身框架的机制是 ...

  3. leetcode算法: Find All Duplicates in an Array

    Given an array of integers, 1 ≤ a[i] ≤ n (n = size of array), some elements appear twice and others ...

  4. 老男孩python学习之作业一购物小程序

    想学编程由来已久 始终没有个结果,痛心不已 如今再次捡起来,望不负期望,不负岁月 ......一万字的废话...... 先介绍一下我的自学课程吧 "路飞学城"的<python ...

  5. NetSNMP开源代码学习——小试牛刀

    原创作品,转载请注明出处,严禁非法转载.如有错误,请留言! email:40879506@qq.com 题外话:技术越是古董级的东西,越是值得学习. 一. 配置 参考: http://www.cnbl ...

  6. Spark:将RDD[List[String,List[Person]]]中的List[Person]通过spark api保存为hdfs文件时一直出现not serializable task,没办法找到"spark自定义Kryo序列化输入输出API"

    声明:本文转自<在Spark中自定义Kryo序列化输入输出API>   在Spark中内置支持两种系列化格式:(1).Java serialization:(2).Kryo seriali ...

  7. Spark:spark df插入hive表后小文件数量多,如何合并?

    在做spark开发过程中,时不时的就有可能遇到租户的hive库目录下的文件个数超出了最大限制问题. 一般情况下通过hive的参数设置: val conf = new SparkConf().setAp ...

  8. Struts(二十三):使用声名式验证

    Struts2工程中的验证分为两种: 1.基于XWork Validation Framework的声明式验证:Struts2提供了一些基于XWork Validation Framework的内建验 ...

  9. VMwaretools、共享文件夹、全屏

    VMware12.1  +  Ubuntu14.04   +  win10专业版  设置  共享文件夹和解决Ubuntu全屏问题. 我实在不喜欢这种敲敲打打的工作,不喜欢这种有点无聊的配置环境.我喜欢 ...

  10. asp.net core 六 Oracle ORM

         .netcore 中 Oracle ORM      在真正将项目移植到.netcore下,才发现会有很多问题,例如访问Oracle,问题出现的时间在2017年底          参考连接 ...