rnn.utils.py

  1. import numpy as np
  2. def softmax(x):
  3. e_x = np.exp(x - np.max(x))
  4. return e_x / e_x.sum(axis=0)
  5. def sigmoid(x):
  6. return 1 / (1 + np.exp(-x))

引入所需的包

  1. import numpy as np
  2. from rnn_utils import *

1 - 基础循环神经网络的前向传播

下面是如何实现一个RNN:

Steps:

  1. 实现单步RNN所需的计算
  2. 循环

1.1 - RNN cell

  1. def rnn_cell_forward(xt, a_prev, parameters):
  2. """
  3. 实现单步RNN
  4. Arguments:
  5. xt -- your input data at timestep "t", numpy array of shape (n_x, m).
  6. a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
  7. parameters -- python dictionary containing:
  8. Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
  9. Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
  10. Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
  11. ba -- Bias, numpy array of shape (n_a, 1)
  12. by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
  13. Returns:
  14. a_next -- next hidden state, of shape (n_a, m)
  15. yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
  16. cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
  17. """
  18. # Retrieve parameters from "parameters"
  19. Wax = parameters["Wax"]
  20. Waa = parameters["Waa"]
  21. Wya = parameters["Wya"]
  22. ba = parameters["ba"]
  23. by = parameters["by"]
  24. # compute next activation state using the formula given above
  25. a_next = np.tanh(np.dot(Wax,xt) + np.dot(Waa,a_prev) + ba)
  26. # compute output of the current cell using the formula given above
  27. yt_pred = softmax(np.dot(Wya,a_next)+by)
  28. # store values you need for backward propagation in cache
  29. cache = (a_next, a_prev, xt, parameters)
  30. return a_next, yt_pred, cache
  1. np.random.seed(1)
  2. xt = np.random.randn(3,10)
  3. a_prev = np.random.randn(5,10)
  4. Waa = np.random.randn(5,5)
  5. Wax = np.random.randn(5,3)
  6. Wya = np.random.randn(2,5)
  7. ba = np.random.randn(5,1)
  8. by = np.random.randn(2,1)
  9. parameters = {"Waa": Waa, "Wax": Wax, "Wya": Wya, "ba": ba, "by": by}
  10. a_next, yt_pred, cache = rnn_cell_forward(xt, a_prev, parameters)
  11. print("a_next[4] = ", a_next[4])
  12. print("a_next.shape = ", a_next.shape)
  13. print("yt_pred[1] =", yt_pred[1])
  14. print("yt_pred.shape = ", yt_pred.shape)
  1. a_next[4] = [ 0.59584544 0.18141802 0.61311866 0.99808218 0.85016201 0.99980978
  2. -0.18887155 0.99815551 0.6531151 0.82872037]
  3. a_next.shape = (5, 10)
  4. yt_pred[1] = [0.9888161 0.01682021 0.21140899 0.36817467 0.98988387 0.88945212
  5. 0.36920224 0.9966312 0.9982559 0.17746526]
  6. yt_pred.shape = (2, 10)

1.2 - RNN 前向传播

  1. def rnn_forward(x, a0, parameters):
  2. """
  3. 实现RNN前向传播
  4. Arguments:
  5. x -- Input data for every time-step, of shape (n_x, m, T_x).
  6. a0 -- Initial hidden state, of shape (n_a, m)
  7. parameters -- python dictionary containing:
  8. Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
  9. Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
  10. Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
  11. ba -- Bias numpy array of shape (n_a, 1)
  12. by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
  13. Returns:
  14. a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
  15. y_pred -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
  16. caches -- tuple of values needed for the backward pass, contains (list of caches, x)
  17. """
  18. # Initialize "caches" which will contain the list of all caches
  19. caches = []
  20. # Retrieve dimensions from shapes of x and Wy
  21. n_x, m, T_x = x.shape
  22. n_y, n_a = parameters["Wya"].shape
  23. # initialize "a" and "y" with zeros
  24. a = np.zeros((n_a,m,T_x))
  25. y_pred = np.zeros((n_y,m,T_x))
  26. # Initialize a_next
  27. a_next = a0
  28. # loop over all time-steps
  29. for t in range(T_x):
  30. # Update next hidden state, compute the prediction, get the cache
  31. a_next, yt_pred, cache = rnn_cell_forward(x[:,:,t],a_next,parameters)
  32. # Save the value of the new "next" hidden state in a
  33. a[:,:,t] = a_next
  34. # Save the value of the prediction in y
  35. y_pred[:,:,t] = yt_pred
  36. # Append "cache" to "caches"
  37. caches.append(cache)
  38. # store values needed for backward propagation in cache
  39. caches = (caches, x)
  40. return a, y_pred, caches
  1. np.random.seed(1)
  2. x = np.random.randn(3,10,4)
  3. a0 = np.random.randn(5,10)
  4. Waa = np.random.randn(5,5)
  5. Wax = np.random.randn(5,3)
  6. Wya = np.random.randn(2,5)
  7. ba = np.random.randn(5,1)
  8. by = np.random.randn(2,1)
  9. parameters = {"Waa": Waa, "Wax": Wax, "Wya": Wya, "ba": ba, "by": by}
  10. a, y_pred, caches = rnn_forward(x, a0, parameters)
  11. print("a[4][1] = ", a[4][1])
  12. print("a.shape = ", a.shape)
  13. print("y_pred[1][3] =", y_pred[1][3])
  14. print("y_pred.shape = ", y_pred.shape)
  15. print("caches[1][1][3] =", caches[1][1][3])
  16. print("len(caches) = ", len(caches))
  1. a[4][1] = [-0.99999375 0.77911235 -0.99861469 -0.99833267]
  2. a.shape = (5, 10, 4)
  3. y_pred[1][3] = [0.79560373 0.86224861 0.11118257 0.81515947]
  4. y_pred.shape = (2, 10, 4)
  5. caches[1][1][3] = [-1.1425182 -0.34934272 -0.20889423 0.58662319]
  6. len(caches) = 2

2 - 长短期记忆网络 LSTM

2.1 - LSTM cell

  1. def lstm_cell_forward(xt, a_prev, c_prev, parameters):
  2. """
  3. Implement a single forward step of the LSTM-cell as described in Figure (4)
  4. Arguments:
  5. xt -- your input data at timestep "t", numpy array of shape (n_x, m).
  6. a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
  7. c_prev -- Memory state at timestep "t-1", numpy array of shape (n_a, m)
  8. parameters -- python dictionary containing:
  9. Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)
  10. bf -- Bias of the forget gate, numpy array of shape (n_a, 1)
  11. Wi -- Weight matrix of the save gate, numpy array of shape (n_a, n_a + n_x)
  12. bi -- Bias of the save gate, numpy array of shape (n_a, 1)
  13. Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x)
  14. bc -- Bias of the first "tanh", numpy array of shape (n_a, 1)
  15. Wo -- Weight matrix of the focus gate, numpy array of shape (n_a, n_a + n_x)
  16. bo -- Bias of the focus gate, numpy array of shape (n_a, 1)
  17. Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
  18. by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
  19. Returns:
  20. a_next -- next hidden state, of shape (n_a, m)
  21. c_next -- next memory state, of shape (n_a, m)
  22. yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
  23. cache -- tuple of values needed for the backward pass, contains (a_next, c_next, a_prev, c_prev, xt, parameters)
  24. Note: ft/it/ot stand for the forget/update/output gates, cct stands for the candidate value (c tilda),
  25. c stands for the memory value
  26. """
  27. # 参数
  28. Wf = parameters["Wf"]
  29. bf = parameters["bf"]
  30. Wi = parameters["Wi"]
  31. bi = parameters["bi"]
  32. Wc = parameters["Wc"]
  33. bc = parameters["bc"]
  34. Wo = parameters["Wo"]
  35. bo = parameters["bo"]
  36. Wy = parameters["Wy"]
  37. by = parameters["by"]
  38. # 维度
  39. n_x, m = xt.shape
  40. n_y, n_a = Wy.shape
  41. # Concatenate a_prev and xt
  42. concat = np.zeros((n_x + n_a,m))
  43. concat[: n_a, :] = a_prev
  44. concat[n_a :, :] = xt
  45. # 遗忘门
  46. ft = sigmoid(np.dot(Wf,concat) + bf)
  47. # 更新门
  48. it = sigmoid(np.dot(Wi,concat) + bi)
  49. # 记忆细胞的候选值
  50. cct = np.tanh(np.dot(Wc,concat)+bc)
  51. # 记忆细胞
  52. c_next = ft*c_prev + it*cct
  53. # 输出门
  54. ot = sigmoid(np.dot(Wo,concat)+bo)
  55. # a的输出
  56. a_next = ot*np.tanh(c_next)
  57. # 预测值
  58. yt_pred = softmax(np.dot(Wy, a_next) + by)
  59. # store values needed for backward propagation in cache
  60. cache = (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters)
  61. return a_next, c_next, yt_pred, cache
  1. np.random.seed(1)
  2. xt = np.random.randn(3,10)
  3. a_prev = np.random.randn(5,10)
  4. c_prev = np.random.randn(5,10)
  5. Wf = np.random.randn(5, 5+3)
  6. bf = np.random.randn(5,1)
  7. Wi = np.random.randn(5, 5+3)
  8. bi = np.random.randn(5,1)
  9. Wo = np.random.randn(5, 5+3)
  10. bo = np.random.randn(5,1)
  11. Wc = np.random.randn(5, 5+3)
  12. bc = np.random.randn(5,1)
  13. Wy = np.random.randn(2,5)
  14. by = np.random.randn(2,1)
  15. parameters = {"Wf": Wf, "Wi": Wi, "Wo": Wo, "Wc": Wc, "Wy": Wy, "bf": bf, "bi": bi, "bo": bo, "bc": bc, "by": by}
  16. a_next, c_next, yt, cache = lstm_cell_forward(xt, a_prev, c_prev, parameters)
  17. print("a_next[4] = ", a_next[4])
  18. print("a_next.shape = ", c_next.shape)
  19. print("c_next[2] = ", c_next[2])
  20. print("c_next.shape = ", c_next.shape)
  21. print("yt[1] =", yt[1])
  22. print("yt.shape = ", yt.shape)
  23. print("cache[1][3] =", cache[1][3])
  24. print("len(cache) = ", len(cache))
  1. a_next[4] = [-0.66408471 0.0036921 0.02088357 0.22834167 -0.85575339 0.00138482
  2. 0.76566531 0.34631421 -0.00215674 0.43827275]
  3. a_next.shape = (5, 10)
  4. c_next[2] = [ 0.63267805 1.00570849 0.35504474 0.20690913 -1.64566718 0.11832942
  5. 0.76449811 -0.0981561 -0.74348425 -0.26810932]
  6. c_next.shape = (5, 10)
  7. yt[1] = [0.79913913 0.15986619 0.22412122 0.15606108 0.97057211 0.31146381
  8. 0.00943007 0.12666353 0.39380172 0.07828381]
  9. yt.shape = (2, 10)
  10. cache[1][3] = [-0.16263996 1.03729328 0.72938082 -0.54101719 0.02752074 -0.30821874
  11. 0.07651101 -1.03752894 1.41219977 -0.37647422]
  12. len(cache) = 10

2.2 - LSTM 前向传播

  1. # GRADED FUNCTION: lstm_forward
  2. def lstm_forward(x, a0, parameters):
  3. """
  4. Implement the forward propagation of the recurrent neural network using an LSTM-cell described in Figure (3).
  5. Arguments:
  6. x -- Input data for every time-step, of shape (n_x, m, T_x).
  7. a0 -- Initial hidden state, of shape (n_a, m)
  8. parameters -- python dictionary containing:
  9. Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)
  10. bf -- Bias of the forget gate, numpy array of shape (n_a, 1)
  11. Wi -- Weight matrix of the save gate, numpy array of shape (n_a, n_a + n_x)
  12. bi -- Bias of the save gate, numpy array of shape (n_a, 1)
  13. Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x)
  14. bc -- Bias of the first "tanh", numpy array of shape (n_a, 1)
  15. Wo -- Weight matrix of the focus gate, numpy array of shape (n_a, n_a + n_x)
  16. bo -- Bias of the focus gate, numpy array of shape (n_a, 1)
  17. Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
  18. by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
  19. Returns:
  20. a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
  21. y -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
  22. caches -- tuple of values needed for the backward pass, contains (list of all the caches, x)
  23. """
  24. # Initialize "caches", which will track the list of all the caches
  25. caches = []
  26. ### START CODE HERE ###
  27. # Retrieve dimensions from shapes of xt and Wy (≈2 lines)
  28. n_x, m, T_x = x.shape
  29. n_y, n_a = parameters['Wy'].shape
  30. # initialize "a", "c" and "y" with zeros (≈3 lines)
  31. a = np.zeros((n_a, m, T_x))
  32. c = np.zeros((n_a, m, T_x))
  33. y = np.zeros((n_y, m, T_x))
  34. # Initialize a_next and c_next (≈2 lines)
  35. a_next = a0
  36. c_next = np.zeros((n_a, m))
  37. # loop over all time-steps
  38. for t in range(T_x):
  39. # Update next hidden state, next memory state, compute the prediction, get the cache (≈1 line)
  40. a_next, c_next, yt, cache = lstm_cell_forward(x[:, :, t], a_next, c_next, parameters)
  41. # Save the value of the new "next" hidden state in a (≈1 line)
  42. a[:,:,t] = a_next
  43. # Save the value of the prediction in y (≈1 line)
  44. y[:,:,t] = yt
  45. # Save the value of the next cell state (≈1 line)
  46. c[:,:,t] = c_next
  47. # Append the cache into caches (≈1 line)
  48. caches.append(cache)
  49. ### END CODE HERE ###
  50. # store values needed for backward propagation in cache
  51. caches = (caches, x)
  52. return a, y, c, caches
  1. np.random.seed(1)
  2. x = np.random.randn(3,10,7)
  3. a0 = np.random.randn(5,10)
  4. Wf = np.random.randn(5, 5+3)
  5. bf = np.random.randn(5,1)
  6. Wi = np.random.randn(5, 5+3)
  7. bi = np.random.randn(5,1)
  8. Wo = np.random.randn(5, 5+3)
  9. bo = np.random.randn(5,1)
  10. Wc = np.random.randn(5, 5+3)
  11. bc = np.random.randn(5,1)
  12. Wy = np.random.randn(2,5)
  13. by = np.random.randn(2,1)
  14. parameters = {"Wf": Wf, "Wi": Wi, "Wo": Wo, "Wc": Wc, "Wy": Wy, "bf": bf, "bi": bi, "bo": bo, "bc": bc, "by": by}
  15. a, y, c, caches = lstm_forward(x, a0, parameters)
  16. print("a[4][3][6] = ", a[4][3][6])
  17. print("a.shape = ", a.shape)
  18. print("y[1][4][3] =", y[1][4][3])
  19. print("y.shape = ", y.shape)
  20. print("caches[1][1[1]] =", caches[1][1][1])
  21. print("c[1][2][1]", c[1][2][1])
  22. print("len(caches) = ", len(caches))
  1. a[4][3][6] = 0.17211776753291672
  2. a.shape = (5, 10, 7)
  3. y[1][4][3] = 0.9508734618501101
  4. y.shape = (2, 10, 7)
  5. caches[1][1[1]] = [ 0.82797464 0.23009474 0.76201118 -0.22232814 -0.20075807 0.18656139
  6. 0.41005165]
  7. c[1][2][1] -0.8555449167181981
  8. len(caches) = 2

3 - RNN反向传播

3.1 - 基础RNN反向传播

  1. def rnn_cell_backward(da_next, cache):
  2. """
  3. Implements the backward pass for the RNN-cell (single time-step).
  4. Arguments:
  5. da_next -- Gradient of loss with respect to next hidden state
  6. cache -- python dictionary containing useful values (output of rnn_step_forward())
  7. Returns:
  8. gradients -- python dictionary containing:
  9. dx -- Gradients of input data, of shape (n_x, m)
  10. da_prev -- Gradients of previous hidden state, of shape (n_a, m)
  11. dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
  12. dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
  13. dba -- Gradients of bias vector, of shape (n_a, 1)
  14. """
  15. # Retrieve values from cache
  16. (a_next, a_prev, xt, parameters) = cache
  17. # Retrieve values from parameters
  18. Wax = parameters["Wax"]
  19. Waa = parameters["Waa"]
  20. Wya = parameters["Wya"]
  21. ba = parameters["ba"]
  22. by = parameters["by"]
  23. # compute the gradient of tanh with respect to a_next
  24. dtanh = (1-a_next*a_next)*da_next #注意这里是 element_wise ,即 * da_next,dtanh 可以只看做一个中间结果的表示方式
  25. # compute the gradient of the loss with respect to Wax
  26. dxt = np.dot(Wax.T, dtanh)
  27. dWax = np.dot(dtanh,xt.T)
  28. # 根据公式1、2, dxt = da_next .( Wax.T . (1- tanh(a_next)**2) ) = da_next .( Wax.T . dtanh * (1/d_a_next) )= Wax.T . dtanh
  29. # 根据公式1、3, dWax = da_next .( (1- tanh(a_next)**2) . xt.T) = da_next .( dtanh * (1/d_a_next) . xt.T )= dtanh . xt.T
  30. # 上面的 . 表示 np.dot
  31. # compute the gradient with respect to Waa
  32. da_prev = np.dot(Waa.T, dtanh)
  33. dWaa = np.dot( dtanh,a_prev.T)
  34. # compute the gradient with respect to b
  35. dba = np.sum( dtanh,keepdims=True,axis=-1) # axis=0 列方向上操作 axis=1 行方向上操作 keepdims=True 矩阵的二维特性
  36. # Store the gradients in a python dictionary
  37. gradients = {"dxt": dxt, "da_prev": da_prev, "dWax": dWax, "dWaa": dWaa, "dba": dba}
  38. return gradients
  1. np.random.seed(1)
  2. xt = np.random.randn(3,10)
  3. a_prev = np.random.randn(5,10)
  4. Wax = np.random.randn(5,3)
  5. Waa = np.random.randn(5,5)
  6. Wya = np.random.randn(2,5)
  7. b = np.random.randn(5,1)
  8. by = np.random.randn(2,1)
  9. parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "ba": ba, "by": by}
  10. a_next, yt, cache = rnn_cell_forward(xt, a_prev, parameters)
  11. da_next = np.random.randn(5,10)
  12. gradients = rnn_cell_backward(da_next, cache)
  13. print("gradients[\"dxt\"][1][2] =", gradients["dxt"][1][2])
  14. print("gradients[\"dxt\"].shape =", gradients["dxt"].shape)
  15. print("gradients[\"da_prev\"][2][3] =", gradients["da_prev"][2][3])
  16. print("gradients[\"da_prev\"].shape =", gradients["da_prev"].shape)
  17. print("gradients[\"dWax\"][3][1] =", gradients["dWax"][3][1])
  18. print("gradients[\"dWax\"].shape =", gradients["dWax"].shape)
  19. print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
  20. print("gradients[\"dWaa\"].shape =", gradients["dWaa"].shape)
  21. print("gradients[\"dba\"][4] =", gradients["dba"][4])
  22. print("gradients[\"dba\"].shape =", gradients["dba"].shape)
  1. gradients["dxt"][1][2] = -0.4605641030588796
  2. gradients["dxt"].shape = (3, 10)
  3. gradients["da_prev"][2][3] = 0.08429686538067718
  4. gradients["da_prev"].shape = (5, 10)
  5. gradients["dWax"][3][1] = 0.3930818739219303
  6. gradients["dWax"].shape = (5, 3)
  7. gradients["dWaa"][1][2] = -0.2848395578696067
  8. gradients["dWaa"].shape = (5, 5)
  9. gradients["dba"][4] = [0.80517166]
  10. gradients["dba"].shape = (5, 1)

通过RNN反向传播

  1. def rnn_backward(da, caches):
  2. """
  3. Implement the backward pass for a RNN over an entire sequence of input data.
  4. Arguments:
  5. da -- Upstream gradients of all hidden states, of shape (n_a, m, T_x)
  6. caches -- tuple containing information from the forward pass (rnn_forward)
  7. Returns:
  8. gradients -- python dictionary containing:
  9. dx -- Gradient w.r.t. the input data, numpy-array of shape (n_x, m, T_x)
  10. da0 -- Gradient w.r.t the initial hidden state, numpy-array of shape (n_a, m)
  11. dWax -- Gradient w.r.t the input's weight matrix, numpy-array of shape (n_a, n_x)
  12. dWaa -- Gradient w.r.t the hidden state's weight matrix, numpy-arrayof shape (n_a, n_a)
  13. dba -- Gradient w.r.t the bias, of shape (n_a, 1)
  14. """
  15. ### START CODE HERE ###
  16. # Retrieve values from the first cache (t=1) of caches (≈2 lines)
  17. (caches, x) = caches
  18. (a1, a0, x1, parameters) = caches[0] # t=1 时的值
  19. # Retrieve dimensions from da's and x1's shapes (≈2 lines)
  20. n_a, m, T_x = da.shape
  21. n_x, m = x1.shape
  22. # initialize the gradients with the right sizes (≈6 lines)
  23. dx = np.zeros((n_x, m, T_x))
  24. dWax = np.zeros((n_a, n_x))
  25. dWaa = np.zeros((n_a, n_a))
  26. dba = np.zeros((n_a, 1))
  27. da0 = np.zeros((n_a, m))
  28. da_prevt = np.zeros((n_a, m))
  29. # Loop through all the time steps
  30. for t in reversed(range(T_x)):
  31. # Compute gradients at time step t. Choose wisely the "da_next" and the "cache" to use in the backward propagation step. (≈1 line)
  32. gradients = rnn_cell_backward(da[:, :, t] + da_prevt, caches[t]) # da[:,:,t] + da_prevt ,每一个时间步后更新梯度
  33. # Retrieve derivatives from gradients (≈ 1 line)
  34. dxt, da_prevt, dWaxt, dWaat, dbat = gradients["dxt"], gradients["da_prev"], gradients["dWax"], gradients["dWaa"], gradients["dba"]
  35. # Increment global derivatives w.r.t parameters by adding their derivative at time-step t (≈4 lines)
  36. dx[:, :, t] = dxt
  37. dWax += dWaxt
  38. dWaa += dWaat
  39. dba += dbat
  40. # Set da0 to the gradient of a which has been backpropagated through all time-steps (≈1 line)
  41. da0 = da_prevt
  42. ### END CODE HERE ###
  43. # Store the gradients in a python dictionary
  44. gradients = {"dx": dx, "da0": da0, "dWax": dWax, "dWaa": dWaa,"dba": dba}
  45. return gradients
  1. np.random.seed(1)
  2. x = np.random.randn(3,10,4)
  3. a0 = np.random.randn(5,10)
  4. Wax = np.random.randn(5,3)
  5. Waa = np.random.randn(5,5)
  6. Wya = np.random.randn(2,5)
  7. ba = np.random.randn(5,1)
  8. by = np.random.randn(2,1)
  9. parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "ba": ba, "by": by}
  10. a, y, caches = rnn_forward(x, a0, parameters)
  11. da = np.random.randn(5, 10, 4)
  12. gradients = rnn_backward(da, caches)
  13. print("gradients[\"dx\"][1][2] =", gradients["dx"][1][2])
  14. print("gradients[\"dx\"].shape =", gradients["dx"].shape)
  15. print("gradients[\"da0\"][2][3] =", gradients["da0"][2][3])
  16. print("gradients[\"da0\"].shape =", gradients["da0"].shape)
  17. print("gradients[\"dWax\"][3][1] =", gradients["dWax"][3][1])
  18. print("gradients[\"dWax\"].shape =", gradients["dWax"].shape)
  19. print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
  20. print("gradients[\"dWaa\"].shape =", gradients["dWaa"].shape)
  21. print("gradients[\"dba\"][4] =", gradients["dba"][4])
  22. print("gradients[\"dba\"].shape =", gradients["dba"].shape)
  1. gradients["dx"][1][2] = [-2.07101689 -0.59255627 0.02466855 0.01483317]
  2. gradients["dx"].shape = (3, 10, 4)
  3. gradients["da0"][2][3] = -0.31494237512664996
  4. gradients["da0"].shape = (5, 10)
  5. gradients["dWax"][3][1] = 11.264104496527777
  6. gradients["dWax"].shape = (5, 3)
  7. gradients["dWaa"][1][2] = 2.303333126579893
  8. gradients["dWaa"].shape = (5, 5)
  9. gradients["dba"][4] = [-0.74747722]
  10. gradients["dba"].shape = (5, 1)

3.2 - LSTM反向传播

3.2.1 单步反向传播

  1. def lstm_cell_backward(da_next, dc_next, cache):
  2. """
  3. Implement the backward pass for the LSTM-cell (single time-step).
  4. Arguments:
  5. da_next -- Gradients of next hidden state, of shape (n_a, m)
  6. dc_next -- Gradients of next cell state, of shape (n_a, m)
  7. cache -- cache storing information from the forward pass
  8. Returns:
  9. gradients -- python dictionary containing:
  10. dxt -- Gradient of input data at time-step t, of shape (n_x, m)
  11. da_prev -- Gradient w.r.t. the previous hidden state, numpy array of shape (n_a, m)
  12. dc_prev -- Gradient w.r.t. the previous memory state, of shape (n_a, m, T_x)
  13. dWf -- Gradient w.r.t. the weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)
  14. dWi -- Gradient w.r.t. the weight matrix of the input gate, numpy array of shape (n_a, n_a + n_x)
  15. dWc -- Gradient w.r.t. the weight matrix of the memory gate, numpy array of shape (n_a, n_a + n_x)
  16. dWo -- Gradient w.r.t. the weight matrix of the save gate, numpy array of shape (n_a, n_a + n_x)
  17. dbf -- Gradient w.r.t. biases of the forget gate, of shape (n_a, 1)
  18. dbi -- Gradient w.r.t. biases of the update gate, of shape (n_a, 1)
  19. dbc -- Gradient w.r.t. biases of the memory gate, of shape (n_a, 1)
  20. dbo -- Gradient w.r.t. biases of the save gate, of shape (n_a, 1)
  21. """
  22. # Retrieve information from "cache"
  23. (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters) = cache
  24. # Retrieve dimensions from xt's and a_next's shape (≈2 lines)
  25. n_x, m = xt.shape
  26. n_a, m = a_next.shape
  27. # Compute gates related derivatives, you can find their values can be found by looking carefully at equations (7) to (10) (≈4 lines)
  28. dot = da_next * np.tanh(c_next) * ot * (1 - ot)
  29. dcct = (dc_next * it + ot * (1 - np.square(np.tanh(c_next))) * it * da_next) * (1 - np.square(cct))
  30. dit = (dc_next * cct + ot * (1 - np.square(np.tanh(c_next))) * cct * da_next) * it * (1 - it)
  31. dft = (dc_next * c_prev + ot *(1 - np.square(np.tanh(c_next))) * c_prev * da_next) * ft * (1 - ft)
  32. # Code equations (7) to (10) (≈4 lines)
  33. # dit = None
  34. # dft = None
  35. # dot = None
  36. # dcct = None
  37. # Compute parameters related derivatives. Use equations (11)-(14) (≈8 lines)
  38. dWf = np.dot(dft,np.concatenate((a_prev, xt), axis=0).T)
  39. dWi = np.dot(dit,np.concatenate((a_prev, xt), axis=0).T)
  40. dWc = np.dot(dcct,np.concatenate((a_prev, xt), axis=0).T)
  41. dWo = np.dot(dot,np.concatenate((a_prev, xt), axis=0).T)
  42. dbf = np.sum(dft, axis=1 ,keepdims = True)
  43. dbi = np.sum(dit, axis=1, keepdims = True)
  44. dbc = np.sum(dcct, axis=1, keepdims = True)
  45. dbo = np.sum(dot, axis=1, keepdims = True)
  46. # Compute derivatives w.r.t previous hidden state, previous memory state and input. Use equations (15)-(17). (≈3 lines)
  47. da_prev = np.dot(parameters['Wf'][:,:n_a].T,dft)+np.dot(parameters['Wi'][:,:n_a].T,dit)+np.dot(parameters['Wc'][:,:n_a].T,dcct)+np.dot(parameters['Wo'][:,:n_a].T,dot)
  48. dc_prev = dc_next*ft+ot*(1-np.square(np.tanh(c_next)))*ft*da_next
  49. dxt = np.dot(parameters['Wf'][:,n_a:].T,dft)+np.dot(parameters['Wi'][:,n_a:].T,dit)+np.dot(parameters['Wc'][:,n_a:].T,dcct)+np.dot(parameters['Wo'][:,n_a:].T,dot)
  50. # parameters['Wf'][:, :n_a].T 每一行的 第 0 到 n_a-1 列的数据取出来
  51. # parameters['Wf'][:, n_a:].T 每一行的 第 n_a 到最后列的数据取出来
  52. # Save gradients in dictionary
  53. gradients = {"dxt": dxt, "da_prev": da_prev, "dc_prev": dc_prev, "dWf": dWf,"dbf": dbf, "dWi": dWi,"dbi": dbi,
  54. "dWc": dWc,"dbc": dbc, "dWo": dWo,"dbo": dbo}
  55. return gradients
  1. np.random.seed(1)
  2. xt = np.random.randn(3,10)
  3. a_prev = np.random.randn(5,10)
  4. c_prev = np.random.randn(5,10)
  5. Wf = np.random.randn(5, 5+3)
  6. bf = np.random.randn(5,1)
  7. Wi = np.random.randn(5, 5+3)
  8. bi = np.random.randn(5,1)
  9. Wo = np.random.randn(5, 5+3)
  10. bo = np.random.randn(5,1)
  11. Wc = np.random.randn(5, 5+3)
  12. bc = np.random.randn(5,1)
  13. Wy = np.random.randn(2,5)
  14. by = np.random.randn(2,1)
  15. parameters = {"Wf": Wf, "Wi": Wi, "Wo": Wo, "Wc": Wc, "Wy": Wy, "bf": bf, "bi": bi, "bo": bo, "bc": bc, "by": by}
  16. a_next, c_next, yt, cache = lstm_cell_forward(xt, a_prev, c_prev, parameters)
  17. da_next = np.random.randn(5,10)
  18. dc_next = np.random.randn(5,10)
  19. gradients = lstm_cell_backward(da_next, dc_next, cache)
  20. print("gradients[\"dxt\"][1][2] =", gradients["dxt"][1][2])
  21. print("gradients[\"dxt\"].shape =", gradients["dxt"].shape)
  22. print("gradients[\"da_prev\"][2][3] =", gradients["da_prev"][2][3])
  23. print("gradients[\"da_prev\"].shape =", gradients["da_prev"].shape)
  24. print("gradients[\"dc_prev\"][2][3] =", gradients["dc_prev"][2][3])
  25. print("gradients[\"dc_prev\"].shape =", gradients["dc_prev"].shape)
  26. print("gradients[\"dWf\"][3][1] =", gradients["dWf"][3][1])
  27. print("gradients[\"dWf\"].shape =", gradients["dWf"].shape)
  28. print("gradients[\"dWi\"][1][2] =", gradients["dWi"][1][2])
  29. print("gradients[\"dWi\"].shape =", gradients["dWi"].shape)
  30. print("gradients[\"dWc\"][3][1] =", gradients["dWc"][3][1])
  31. print("gradients[\"dWc\"].shape =", gradients["dWc"].shape)
  32. print("gradients[\"dWo\"][1][2] =", gradients["dWo"][1][2])
  33. print("gradients[\"dWo\"].shape =", gradients["dWo"].shape)
  34. print("gradients[\"dbf\"][4] =", gradients["dbf"][4])
  35. print("gradients[\"dbf\"].shape =", gradients["dbf"].shape)
  36. print("gradients[\"dbi\"][4] =", gradients["dbi"][4])
  37. print("gradients[\"dbi\"].shape =", gradients["dbi"].shape)
  38. print("gradients[\"dbc\"][4] =", gradients["dbc"][4])
  39. print("gradients[\"dbc\"].shape =", gradients["dbc"].shape)
  40. print("gradients[\"dbo\"][4] =", gradients["dbo"][4])
  41. print("gradients[\"dbo\"].shape =", gradients["dbo"].shape)
  1. gradients["dxt"][1][2] = 3.2305591151091884
  2. gradients["dxt"].shape = (3, 10)
  3. gradients["da_prev"][2][3] = -0.06396214197109241
  4. gradients["da_prev"].shape = (5, 10)
  5. gradients["dc_prev"][2][3] = 0.7975220387970015
  6. gradients["dc_prev"].shape = (5, 10)
  7. gradients["dWf"][3][1] = -0.1479548381644968
  8. gradients["dWf"].shape = (5, 8)
  9. gradients["dWi"][1][2] = 1.0574980552259903
  10. gradients["dWi"].shape = (5, 8)
  11. gradients["dWc"][3][1] = 2.3045621636876668
  12. gradients["dWc"].shape = (5, 8)
  13. gradients["dWo"][1][2] = 0.3313115952892109
  14. gradients["dWo"].shape = (5, 8)
  15. gradients["dbf"][4] = [0.18864637]
  16. gradients["dbf"].shape = (5, 1)
  17. gradients["dbi"][4] = [-0.40142491]
  18. gradients["dbi"].shape = (5, 1)
  19. gradients["dbc"][4] = [0.25587763]
  20. gradients["dbc"].shape = (5, 1)
  21. gradients["dbo"][4] = [0.13893342]
  22. gradients["dbo"].shape = (5, 1)

3.3 通过RNN和LSTM反向传播

  1. def lstm_backward(da, caches):
  2. """
  3. Implement the backward pass for the RNN with LSTM-cell (over a whole sequence).
  4. Arguments:
  5. da -- Gradients w.r.t the hidden states, numpy-array of shape (n_a, m, T_x)
  6. dc -- Gradients w.r.t the memory states, numpy-array of shape (n_a, m, T_x)
  7. caches -- cache storing information from the forward pass (lstm_forward)
  8. Returns:
  9. gradients -- python dictionary containing:
  10. dx -- Gradient of inputs, of shape (n_x, m, T_x)
  11. da0 -- Gradient w.r.t. the previous hidden state, numpy array of shape (n_a, m)
  12. dWf -- Gradient w.r.t. the weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)
  13. dWi -- Gradient w.r.t. the weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x)
  14. dWc -- Gradient w.r.t. the weight matrix of the memory gate, numpy array of shape (n_a, n_a + n_x)
  15. dWo -- Gradient w.r.t. the weight matrix of the save gate, numpy array of shape (n_a, n_a + n_x)
  16. dbf -- Gradient w.r.t. biases of the forget gate, of shape (n_a, 1)
  17. dbi -- Gradient w.r.t. biases of the update gate, of shape (n_a, 1)
  18. dbc -- Gradient w.r.t. biases of the memory gate, of shape (n_a, 1)
  19. dbo -- Gradient w.r.t. biases of the save gate, of shape (n_a, 1)
  20. """
  21. # Retrieve values from the first cache (t=1) of caches.
  22. (caches, x) = caches
  23. (a1, c1, a0, c0, f1, i1, cc1, o1, x1, parameters) = caches[0]
  24. # Retrieve dimensions from da's and x1's shapes (≈2 lines)
  25. n_a, m, T_x = da.shape
  26. n_x, m = x1.shape
  27. # initialize the gradients with the right sizes (≈12 lines)
  28. dx = np.zeros((n_x, m, T_x))
  29. da0 = np.zeros((n_a, m))
  30. da_prevt = np.zeros((n_a, m))
  31. dc_prevt = np.zeros((n_a, m))
  32. dWf = np.zeros((n_a, n_a + n_x))
  33. dWi = np.zeros((n_a, n_a + n_x))
  34. dWc = np.zeros((n_a, n_a + n_x))
  35. dWo = np.zeros((n_a, n_a + n_x))
  36. dbf = np.zeros((n_a, 1))
  37. dbi = np.zeros((n_a, 1))
  38. dbc = np.zeros((n_a, 1))
  39. dbo = np.zeros((n_a, 1))
  40. # loop back over the whole sequence
  41. for t in reversed(range(T_x)):
  42. # Compute all gradients using lstm_cell_backward
  43. gradients = lstm_cell_backward(da[:,:,t]+da_prevt,dc_prevt,caches[t])
  44. # Store or add the gradient to the parameters' previous step's gradient
  45. dx[:, :, t] = gradients['dxt']
  46. dWf = dWf+gradients['dWf']
  47. dWi = dWi+gradients['dWi']
  48. dWc = dWc+gradients['dWc']
  49. dWo = dWo+gradients['dWo']
  50. dbf = dbf+gradients['dbf']
  51. dbi = dbi+gradients['dbi']
  52. dbc = dbc+gradients['dbc']
  53. dbo = dbo+gradients['dbo']
  54. # Set the first activation's gradient to the backpropagated gradient da_prev.
  55. da0 = gradients['da_prev']
  56. # Store the gradients in a python dictionary
  57. gradients = {"dx": dx, "da0": da0, "dWf": dWf,"dbf": dbf, "dWi": dWi,"dbi": dbi,
  58. "dWc": dWc,"dbc": dbc, "dWo": dWo,"dbo": dbo}
  59. return gradients
  1. np.random.seed(1)
  2. x = np.random.randn(3,10,7)
  3. a0 = np.random.randn(5,10)
  4. Wf = np.random.randn(5, 5+3)
  5. bf = np.random.randn(5,1)
  6. Wi = np.random.randn(5, 5+3)
  7. bi = np.random.randn(5,1)
  8. Wo = np.random.randn(5, 5+3)
  9. bo = np.random.randn(5,1)
  10. Wc = np.random.randn(5, 5+3)
  11. bc = np.random.randn(5,1)
  12. parameters = {"Wf": Wf, "Wi": Wi, "Wo": Wo, "Wc": Wc, "Wy": Wy, "bf": bf, "bi": bi, "bo": bo, "bc": bc, "by": by}
  13. a, y, c, caches = lstm_forward(x, a0, parameters)
  14. da = np.random.randn(5, 10, 4)
  15. gradients = lstm_backward(da, caches)
  16. print("gradients[\"dx\"][1][2] =", gradients["dx"][1][2])
  17. print("gradients[\"dx\"].shape =", gradients["dx"].shape)
  18. print("gradients[\"da0\"][2][3] =", gradients["da0"][2][3])
  19. print("gradients[\"da0\"].shape =", gradients["da0"].shape)
  20. print("gradients[\"dWf\"][3][1] =", gradients["dWf"][3][1])
  21. print("gradients[\"dWf\"].shape =", gradients["dWf"].shape)
  22. print("gradients[\"dWi\"][1][2] =", gradients["dWi"][1][2])
  23. print("gradients[\"dWi\"].shape =", gradients["dWi"].shape)
  24. print("gradients[\"dWc\"][3][1] =", gradients["dWc"][3][1])
  25. print("gradients[\"dWc\"].shape =", gradients["dWc"].shape)
  26. print("gradients[\"dWo\"][1][2] =", gradients["dWo"][1][2])
  27. print("gradients[\"dWo\"].shape =", gradients["dWo"].shape)
  28. print("gradients[\"dbf\"][4] =", gradients["dbf"][4])
  29. print("gradients[\"dbf\"].shape =", gradients["dbf"].shape)
  30. print("gradients[\"dbi\"][4] =", gradients["dbi"][4])
  31. print("gradients[\"dbi\"].shape =", gradients["dbi"].shape)
  32. print("gradients[\"dbc\"][4] =", gradients["dbc"][4])
  33. print("gradients[\"dbc\"].shape =", gradients["dbc"].shape)
  34. print("gradients[\"dbo\"][4] =", gradients["dbo"][4])
  35. print("gradients[\"dbo\"].shape =", gradients["dbo"].shape)
  1. gradients["dx"][1][2] = [-0.00173313 0.08287442 -0.30545663 -0.43281115]
  2. gradients["dx"].shape = (3, 10, 4)
  3. gradients["da0"][2][3] = -0.09591150195400465
  4. gradients["da0"].shape = (5, 10)
  5. gradients["dWf"][3][1] = -0.06981985612744009
  6. gradients["dWf"].shape = (5, 8)
  7. gradients["dWi"][1][2] = 0.10237182024854771
  8. gradients["dWi"].shape = (5, 8)
  9. gradients["dWc"][3][1] = -0.062498379492745226
  10. gradients["dWc"].shape = (5, 8)
  11. gradients["dWo"][1][2] = 0.04843891314443013
  12. gradients["dWo"].shape = (5, 8)
  13. gradients["dbf"][4] = [-0.0565788]
  14. gradients["dbf"].shape = (5, 1)
  15. gradients["dbi"][4] = [-0.15399065]
  16. gradients["dbi"].shape = (5, 1)
  17. gradients["dbc"][4] = [-0.29691142]
  18. gradients["dbc"].shape = (5, 1)
  19. gradients["dbo"][4] = [-0.29798344]
  20. gradients["dbo"].shape = (5, 1)

逐步构建循环神经网络 RNN的更多相关文章

  1. 循环神经网络RNN模型和长短时记忆系统LSTM

    传统DNN或者CNN无法对时间序列上的变化进行建模,即当前的预测只跟当前的输入样本相关,无法建立在时间或者先后顺序上出现在当前样本之前或者之后的样本之间的联系.实际的很多场景中,样本出现的时间顺序非常 ...

  2. 用纯Python实现循环神经网络RNN向前传播过程(吴恩达DeepLearning.ai作业)

    Google TensorFlow程序员点赞的文章!   前言 目录: - 向量表示以及它的维度 - rnn cell - rnn 向前传播 重点关注: - 如何把数据向量化的,它们的维度是怎么来的 ...

  3. 循环神经网络(RNN, Recurrent Neural Networks)介绍(转载)

    循环神经网络(RNN, Recurrent Neural Networks)介绍    这篇文章很多内容是参考:http://www.wildml.com/2015/09/recurrent-neur ...

  4. 通过keras例子理解LSTM 循环神经网络(RNN)

    博文的翻译和实践: Understanding Stateful LSTM Recurrent Neural Networks in Python with Keras 正文 一个强大而流行的循环神经 ...

  5. 循环神经网络RNN及LSTM

    一.循环神经网络RNN RNN综述 https://juejin.im/entry/5b97e36cf265da0aa81be239 RNN中为什么要采用tanh而不是ReLu作为激活函数?  htt ...

  6. 深度学习之循环神经网络RNN概述,双向LSTM实现字符识别

    深度学习之循环神经网络RNN概述,双向LSTM实现字符识别 2. RNN概述 Recurrent Neural Network - 循环神经网络,最早出现在20世纪80年代,主要是用于时序数据的预测和 ...

  7. 从网络架构方面简析循环神经网络RNN

    一.前言 1.1 诞生原因 在普通的前馈神经网络(如多层感知机MLP,卷积神经网络CNN)中,每次的输入都是独立的,即网络的输出依赖且仅依赖于当前输入,与过去一段时间内网络的输出无关.但是在现实生活中 ...

  8. 循环神经网络(RNN, Recurrent Neural Networks)介绍

    原文地址: http://blog.csdn.net/heyongluoyao8/article/details/48636251# 循环神经网络(RNN, Recurrent Neural Netw ...

  9. 『cs231n』循环神经网络RNN

    循环神经网络 循环神经网络介绍摘抄自莫凡博士的教程 序列数据 我们想象现在有一组序列数据 data 0,1,2,3. 在当预测 result0 的时候,我们基于的是 data0, 同样在预测其他数据的 ...

随机推荐

  1. filter的知识点 和 实例

    一.过滤器Filter 1.filter的简介 filter是对客户端访问资源的过滤,符合条件放行,不符合条件不放行,并且可以对目    标资源访问前后进行逻辑处理 2.快速入门 步骤: 1)编写一个 ...

  2. 面向对象编程之Java多态

    我相信从学习计算机面向对象编程起就很多人背下了继承.封装.多态三个特性,可是多态并不是那么好理解的.通常做几道题,背下几次多态的动态绑定规律,可是依旧在一段时间后忘记了多态的存在,为什么要多态,这个程 ...

  3. win10下Redis安装

    环境:win64 1.因为Redis官方不支持Windows,所在只能在GitHub上下载,下载地址:https://github.com/ServiceStack/redis-windows/blo ...

  4. django cookies与session

    1. cookiies # cookies def login(request): print('COOKIES',request.COOKIES) print('SESSION',request.s ...

  5. phpredis Redis集群 Redis Cluster

    官方url: https://github.com/phpredis/phpredis/blob/develop/cluster.markdown#readme 2017年10月29日20:44:25 ...

  6. vins-mono的边缘化分析

    ##marg 基础   摘自贺一家的博客 在我们这个工科领域,它来源于概率论中的边际分布(marginal distribution).如从联合分布p(x,y)去掉y得到p(x),也就是说从一系列随机 ...

  7. 关于vue执行打包后,如何在本地浏览问题

    最近一个人在捣鼓vue,写完项目后发现在npm run dev下可以正常访问,bulid之后却一片空白,查看console出现许多Failed to load resource: net::ERR_F ...

  8. eclipse与hadoop-eclipse-plugin之间的版本对应关系

    eclipse与hadoop-eclipse-plugin之间,版本互相不兼容,或者说,版本要求严格. 把hadoop-eclipse-plugin复制到eclipse的plugins目录下以后,如果 ...

  9. BeanUtils工具类

    用对象传参,用JavaBean传参. BeanUtils可以优化传参过程. 学习框架之后,BeanUtils的功能都由框架来完成. 一.为什么用BeanUtils? 每次我们的函数都要传递很多参数很麻 ...

  10. vue 之 筛选功能实现

    要实现的效果如下:根据输入框里面输入的内容筛选下面列表: 推荐实现代码如下:其中 allProductData 就是用来下拉列表的数据,allProductList 为从获取的所有列表的数据: