Character level language model - Dinosaurus land



  • 如何存储文本数据,以便使用RNN进行处理。

  • 如何合成数据,通过采样在每个time step预测,并通过下一个RNN-cell unit。

  • 如何构建字符级文本,生成循环神经网络(RNN)。

  • 为什么梯度修剪(clipping the gradients)很重要?

  1. import numpy as np
  2. import random
  3. import time
  4. import cllm_utils

1. 问题描述(Problem Statement)

1.1 数据集与预处理(Dataset and Preprocessing)

  1. data = open('datasets/dinos.txt', 'r').read()
  2. # 单词全转换为小写
  3. data= data.lower()
  4. # 转化为无序且不重复的元素列表
  5. chars = list(set(data))
  6. print(chars)
  7. data_size, vocab_size = len(data), len(chars)
  8. print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))

['i', '\n', 'd', 'e', 'v', 'f', 'l', 'g', 'u', 'm', 'y', 'q', 'w', 's', 'k', 't', 'a', 'h', 'o', 'n', 'r', 'x', 'j', 'z', 'c', 'b', 'p']

There are 19909 total characters and 27 unique characters in your data.

这些字符是a-z(26个英文字符)加上“\n”(换行字符),在这里换行字符起到了在视频中类似的EOS(句子结尾)的作用, 这里表示了名字的结束而不是句子的结尾。下面我们将创建一个字典,每个字符映射到0-26的索引,然后再创建一个字典,该字典每个索引映射相应的字符,它会帮助我们找出softmax层的概率分布输出中的字符。下面创建 char_to_ixix_to_char 字典。

  1. char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) }
  2. ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) }
  3. print(ix_to_char)

{0: '\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'}

1.2 模型概述(Overview of the model)


  • 初始化参数

  • 运行optimization循环:

    • 前向传播 计算 loss function

    • 反向传播 计算关于 loss function 的梯度

    • 修建梯度(Clip the gradients) 避免梯度爆炸

    • 用梯度下降更新规则 更新参数

  • 返回学习好的参数

Figure 1: Recurrent Neural Network.

在每个时间步, RNN 会预测给定字符的下一个字符是什么。数据集 \(X = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, ..., x^{\langle T_x \rangle})\) 在训练集是字符的列表, 同时 \(Y = (y^{\langle 1 \rangle}, y^{\langle 2 \rangle}, ..., y^{\langle T_x \rangle})\) 在每个time-step \(t\) 也是如此。 我们有:\(x^{\langle t+1 \rangle} = y^{\langle t \rangle}\).

2. 构建模型中的模块(Building blocks of the model)


  • 梯度修建(Gradient clipping):避免梯度爆炸(exploding gradients)

  • 取样(Sampling):一种用来生成字符的技术

2.1 梯度修剪(Clipping the gradients in the optimization loop)


接下来我们将实现一个修剪函数,该函数:输入一个梯度字典,输出一个已经修剪过了的梯度。有许多不同的方法进行梯度修剪。我们将使用 element-wise clipping procedure,梯度向量的每一个元素都被限制在[-N, N]的范围。例,有一个maxValue(比如10),如果梯度的任何值大于10,那么它将被设置为10,那么梯度的任何值小于-10,如果它在-10-10之间,则不变。

Figure 2: 在网络进入轻微的 "exploding gradient"问题,使用无梯度修剪 和 梯度修剪的可视化图。

Exercise: 实现下面的函数,返回一个修剪过后的梯度字典 gradients;函数接受 maximum threshold,并返回修剪后的梯度。

  1. ### GRADED FUNCTION: clip
  2. def clip(gradients, maxValue):
  3. '''
  4. Clips the gradients' values between minimum and maximum.
  5. Arguments:
  6. gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
  7. maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue
  8. Returns:
  9. gradients -- a dictionary with the clipped gradients.
  10. '''
  11. dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
  12. ### START CODE HERE ###
  13. # clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)
  14. for gradient in [dWax, dWaa, dWya, db, dby]:
  15. gradient.clip(-maxValue, maxValue, out=gradient)
  16. ### END CODE HERE ###
  17. gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
  18. return gradients


  1. np.random.seed(3)
  2. dWax = np.random.randn(5,3)*10
  3. dWaa = np.random.randn(5,5)*10
  4. dWya = np.random.randn(2,5)*10
  5. db = np.random.randn(5,1)*10
  6. dby = np.random.randn(2,1)*10
  7. gradients = {"dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby}
  8. gradients = clip(gradients, 10)
  9. print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
  10. print("gradients[\"dWax\"][3][1] =", gradients["dWax"][3][1])
  11. print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
  12. print("gradients[\"db\"][4] =", gradients["db"][4])
  13. print("gradients[\"dby\"][1] =", gradients["dby"][1])


  1. np.random.seed(3)
  2. dWax = np.random.randn(5,3)*10
  3. dWaa = np.random.randn(5,5)*10
  4. dWya = np.random.randn(2,5)*10
  5. db = np.random.randn(5,1)*10
  6. dby = np.random.randn(2,1)*10
  7. gradients = {"dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby}
  8. gradients = clip(gradients, 10)
  9. print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
  10. print("gradients[\"dWax\"][3][1] =", gradients["dWax"][3][1])
  11. print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
  12. print("gradients[\"db\"][4] =", gradients["db"][4])
  13. print("gradients[\"dby\"][1] =", gradients["dby"][1])

gradients["dWaa"][1][2] = 10.0

gradients["dWax"][3][1] = -10.0

gradients["dWya"][1][2] = 0.2971381536101662

gradients["db"][4] = [10.]

gradients["dby"][1] = [8.45833407]

2.2 采样(Sampling)


Figure 3: 我们假设模型已经训练过了。我们在第一步传入 \(x^{\langle 1\rangle} = \vec{0}\),然后让网络一次对一个字符进行采样。

Exercise: 实现 sample 函数. 有4个步骤:

  • Step 1: 网络的第一个输入是 "dummy" input \(x^{\langle 1 \rangle} = \vec{0}\) (零向量)。 这是在生成字符之前的默认输入。 同时设置 \(a^{\langle 0 \rangle} = \vec{0}\)

  • Step 2: 运行一次 forward propagation,然后得到 \(a^{\langle 1 \rangle}\) and \(\hat{y}^{\langle 1 \rangle}\)。公式如下:

\[a^{\langle t+1 \rangle} = \tanh(W_{ax} x^{\langle t \rangle } + W_{aa} a^{\langle t \rangle } + b)\tag{1}
\[z^{\langle t + 1 \rangle } = W_{ya} a^{\langle t + 1 \rangle } + b_y \tag{2}
\[\hat{y}^{\langle t+1 \rangle } = softmax(z^{\langle t + 1 \rangle })\tag{3}

注意 \(\hat{y}^{\langle t+1 \rangle }\) 是一个 (softmax) 概率向量(probability vector) (its entries are between 0 and 1 and sum to 1);\(\hat{y}^{\langle t+1 \rangle}_i\) 表示索引“i”的字符是下一个字符的概率。

  • Step 3: 采样(sampling): 根据\(\hat{y}^{\langle t+1 \rangle }\) 指定的概率分布选择下一个字符的索引,假如 \(\hat{y}^{\langle t+1 \rangle }_i = 0.16\), 那么选择索引 "i" 的概率是 16%,为了实现它,你可以使用 np.random.choice.

Here is an example of how to use np.random.choice()

  1. np.random.seed(0)
  2. p = np.array([0.1, 0.0, 0.7, 0.2])
  3. index = np.random.choice([0, 1, 2, 3], p = p.ravel())


\(P(index = 0) = 0.1, P(index = 1) = 0.0, P(index = 2) = 0.7, P(index = 3) = 0.2\).

  • Step 4: 在 sample() 中实现的最后一步是用 \(x^{\langle t + 1 \rangle }\) 的值覆盖变量 x(当前存储\(x^{\langle t \rangle }\))。

    • 我们将创建一个与我们 所选择的字符(对应索引idx=1)相对应的one-hot向量([0,1,0,...]) 来表示 \(x^{\langle t + 1 \rangle }\) 作为预测。

    • 然后在步骤1中前向传播 \(x^{\langle t + 1 \rangle }\) ,并不断重复这个过程 直到得到一个 "\n" 字符, 表明已经到达恐龙名称的末尾。

  1. # GRADED FUNCTION: sample
  2. def sample(parameters, char_to_ix, seed):
  3. """
  4. Sample a sequence of characters according to a sequence of probability distributions output of the RNN
  5. Arguments:
  6. parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b.
  7. char_to_ix -- python dictionary mapping each character to an index.
  8. seed -- used for grading purposes. Do not worry about it.
  9. Returns:
  10. indices -- a list of length n containing the indices of the sampled characters.
  11. """
  12. # Retrieve parameters and relevant shapes from "parameters" dictionary
  13. Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
  14. # print(Wya.shape, by.shape)
  15. vocab_size = by.shape[0]
  16. n_a = Waa.shape[1]
  17. ### START CODE HERE ###
  18. # Step 1: Create the one-hot vector x for the first character (initializing the sequence generation). (≈1 line)
  19. x = np.zeros((vocab_size, 1))
  20. # Step 1': Initialize a_prev as zeros (≈1 line)
  21. a_prev = np.zeros((n_a, 1))
  22. # 创建索引的空列表,这是包含要生成的字符的索引的列表。
  23. indices = []
  24. # idx是检测换行符的标志,将其初始化为-1。
  25. idx = -1
  26. # Loop over time-steps t. At each time-step, sample(抽取) a character from a probability distribution(概率分布)
  27. # append its index to "indices"(将其索引附加到“indices”上). We'll stop if we reach 50 characters
  28. # (which should be very unlikely with a well trained model),
  29. # which helps debugging and prevents entering an infinite loop.(这有助于调试,并防止进入无限循环)
  30. counter = 0
  31. newline_character = char_to_ix['\n']
  32. while (idx != newline_character and counter != 50):
  33. # Step 2: Forward propagate x using the equations (1), (2) and (3)
  34. a = np.tanh(, x) +, a_prev) + b)
  35. z =, a) + by
  36. y = softmax(z)
  37. # for grading purposes
  38. np.random.seed(counter+seed)
  39. # Step 3: Sample the index of a character within the vocabulary from the probability distribution y
  40. idx = np.random.choice(list(range(vocab_size)), p = y.ravel()) # y是概率, idx是概率最大的元素
  41. # Append the index to "indices"
  42. indices.append(idx)
  43. # Step 4: Overwrite the input character as the one corresponding to the sampled index.
  44. x = np.zeros((vocab_size, 1))
  45. x[idx] = 1
  46. # Update "a_prev" to be "a"
  47. a_prev = a
  48. # for grading purposes
  49. seed += 1
  50. counter +=1
  51. ### END CODE HERE ###
  52. if (counter == 50):
  53. indices.append(char_to_ix['\n'])
  54. return indices


  1. np.random.seed(2)
  2. _, n_a = 20, 100
  3. Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
  4. b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
  5. parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
  6. indices = sample(parameters, char_to_ix, 0)
  7. print("Sampling:")
  8. print("list of sampled indices:", indices)
  9. print("list of sampled characters:", [ix_to_char[i] for i in indices])


list of sampled indices: [12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24, 7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 17, 24, 12, 3, 1, 0]

list of sampled characters: ['l', 'q', 'x', 'n', 'm', 'i', 'j', 'v', 'x', 'f', 'm', 'k', 'l', 'f', 'u', 'o', 'u', 'n', 'c', 'b', 'a', 'u', 'r', 'x', 'g', 'y', 'f', 'y', 'r', 'j', 'p', 'b', 'c', 'h', 'o', 'l', 'k', 'g', 'a', 'l', 'j', 'b', 'g', 'g', 'k', 'q', 'x', 'l', 'c', 'a', '\n']

3. 构建语言模型(Building the language model)

3.1 Gradient descent

在这里,我们将实现一个执行 随机梯度下降 的一个步骤的函数(带有梯度修剪)。我们将一次训练一个样本,所以优化算法将是随机梯度下降,这里是RNN的一个通用的优化循环的步骤:

  • 通过RNN前向传播计算cost.
  • 通过时间,反向传播计算关于参数的梯度损失.
  • 梯度修剪.
  • 使用梯度下降更新参数.

Exercise: Implement this optimization process (one step of stochastic gradient descent),下为已知函数.

  1. def rnn_forward(X, Y, a_prev, parameters):
  2. """ Performs the forward propagation through the RNN and computes the cross-entropy loss.
  3. It returns the loss' value as well as a "cache" storing values to be used in the backpropagation."""
  4. ....
  5. return loss, cache
  6. def rnn_backward(X, Y, parameters, cache):
  7. """ Performs the backward propagation through time to compute the gradients of the loss with respect
  8. to the parameters. It returns also all the hidden states."""
  9. ...
  10. return gradients, a
  11. def update_parameters(parameters, gradients, learning_rate):
  12. """ Updates parameters using the Gradient Descent Update Rule."""
  13. ...
  14. return parameters
  1. # GRADED FUNCTION: optimize
  2. def optimize(X, Y, a_prev, parameters, learning_rate = 0.01):
  3. """
  4. Execute one step of the optimization to train the model.
  5. Arguments:
  6. X -- list of integers, where each integer is a number that maps to a character in the vocabulary.
  7. Y -- 整数列表,与X完全相同,但向左移动了一个索引。
  8. a_prev -- previous hidden state.
  9. parameters -- python dictionary containing:
  10. Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
  11. Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
  12. Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
  13. b -- Bias, numpy array of shape (n_a, 1)
  14. by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
  15. learning_rate -- learning rate for the model.
  16. Returns:
  17. loss -- value of the loss function (cross-entropy)
  18. gradients -- python dictionary containing:
  19. dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
  20. dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
  21. dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)
  22. db -- Gradients of bias vector, of shape (n_a, 1)
  23. dby -- Gradients of output bias vector, of shape (n_y, 1)
  24. a[len(X)-1] -- the last hidden state, of shape (n_a, 1)
  25. """
  26. ### START CODE HERE ###
  27. # Forward propagate through time (≈1 line)
  28. loss, cache = rnn_forward(X, Y, a_prev, parameters)
  29. # Backpropagate through time (≈1 line)
  30. gradients, a = rnn_backward(X, Y, parameters, cache)
  31. # Clip your gradients between -5 (min) and 5 (max) (≈1 line)
  32. gradients = clip(gradients, 5)
  33. # Update parameters (≈1 line)
  34. parameters = update_parameters(parameters, gradients, learning_rate)
  35. ### END CODE HERE ###
  36. return loss, gradients, a[len(X)-1]


  1. np.random.seed(1)
  2. vocab_size, n_a = 27, 100
  3. a_prev = np.random.randn(n_a, 1)
  4. Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
  5. b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
  6. parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
  7. X = [12,3,5,11,22,3]
  8. Y = [4,14,11,22,25, 26]
  9. loss, gradients, a_last = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
  10. print("Loss =", loss)
  11. print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
  12. print("np.argmax(gradients[\"dWax\"]) =", np.argmax(gradients["dWax"]))
  13. print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
  14. print("gradients[\"db\"][4] =", gradients["db"][4])
  15. print("gradients[\"dby\"][1] =", gradients["dby"][1])
  16. print("a_last[4] =", a_last[4])

Loss = 126.503975722

gradients["dWaa"][1][2] = 0.194709315347

np.argmax(gradients["dWax"]) = 93

gradients["dWya"][1][2] = -0.007773876032

gradients["db"][4] = [-0.06809825]

gradients["dby"][1] = [ 0.01538192]

a_last[4] = [-1.]

3.2 Training the model

给定数据集 dinosaur names,我们使用数据集的每一行(一个名称)作为一个训练样本。每100步随机梯度下降,你将抽样10个随机选择的名字,看看算法是怎么做的。记住要打乱数据集,以便随机梯度下降以随机顺序访问样本。

Exercise: 实现 model().

examples[index] 包含一个 dinosaur name (string),为了创建example (X, Y), 可以使用:

  1. index = j % len(examples)
  2. X = [None] + [char_to_ix[ch] for ch in examples[index]]
  3. Y = X[1:] + [char_to_ix["\n"]]

注意:我们使用 index= j % len(examples), 其中 j = 1....num_iterations, 为了确保 examples[index] 总是有效 (index 小于 len(examples))。

rnn_forward() 会将 X 的第一个值 None 解释为 \(x^{\langle 0 \rangle} = \vec{0}\)。 此外,为了确保 Y 等于 X ,会向左移动一步,并添加一个附加的“\n”以表示恐龙名称的结束。

  1. # GRADED FUNCTION: model
  2. def model(data, ix_to_char, char_to_ix, num_iterations = 35000, n_a = 50, dino_names = 7, vocab_size = 27):
  3. """
  4. Trains the model and generates dinosaur names.
  5. Arguments:
  6. data -- text corpus
  7. ix_to_char -- dictionary that maps the index to a character
  8. char_to_ix -- dictionary that maps a character to an index
  9. num_iterations -- number of iterations to train the model for
  10. n_a -- number of units of the RNN cell
  11. dino_names -- number of dinosaur names you want to sample at each iteration.
  12. vocab_size -- number of unique characters found in the text, size of the vocabulary
  13. Returns:
  14. parameters -- learned parameters
  15. """
  16. # Retrieve n_x and n_y from vocab_size
  17. n_x, n_y = vocab_size, vocab_size
  18. # Initialize parameters
  19. parameters = initialize_parameters(n_a, n_x, n_y)
  20. # Initialize loss (this is required because we want to smooth our loss, don't worry about it)
  21. loss = get_initial_loss(vocab_size, dino_names)
  22. # Build list of all dinosaur names (training examples).
  23. with open("./datasets/dinos.txt") as f:
  24. examples = f.readlines()
  25. examples = [x.lower().strip() for x in examples] # 名字列表
  26. # Shuffle list of all dinosaur names
  27. np.random.seed(0)
  28. np.random.shuffle(examples)
  29. # Initialize the hidden state of your LSTM
  30. a_prev = np.zeros((n_a, 1))
  31. # Optimization loop
  32. for j in range(num_iterations):
  33. ### START CODE HERE ###
  34. # Use the hint above to define one training example (X,Y) (≈ 2 lines)
  35. index = j % len(examples)
  36. X = [None] + [char_to_ix[ch] for ch in examples[index]]
  37. Y = X[1:] + [char_to_ix['\n']]
  38. # Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters
  39. # Choose a learning rate of 0.01
  40. curr_loss, gradients, a_prev = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
  41. ### END CODE HERE ###
  42. # Use a latency trick to keep the loss smooth. It happens here to accelerate the training.
  43. loss = smooth(loss, curr_loss)
  44. # Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model is learning properly
  45. if j % 2000 == 0:
  46. print('Iteration: %d, Loss: %f' % (j, loss) + '\n')
  47. # The number of dinosaur names to print
  48. seed = 0
  49. for name in range(dino_names):
  50. # Sample indices and print them
  51. sampled_indices = sample(parameters, char_to_ix, seed)
  52. print_sample(sampled_indices, ix_to_char)
  53. seed += 1 # To get the same result for grading purposed, increment the seed by one.
  54. print('\n')
  55. return parameters


  1. parameters = model(data, ix_to_char, char_to_ix)


  1. Iteration: 0, Loss: 23.087336
  2. Nkzxwtdmfqoeyhsqwasjkjvu
  3. Kneb
  4. Kzxwtdmfqoeyhsqwasjkjvu
  5. Neb
  6. Zxwtdmfqoeyhsqwasjkjvu
  7. Eb
  8. Xwtdmfqoeyhsqwasjkjvu
  9. Iteration: 2000, Loss: 27.884160
  10. Liusskeomnolxeros
  11. Hmdaairus
  12. Hytroligoraurus
  13. Lecalosapaus
  14. Xusicikoraurus
  15. Abalpsamantisaurus
  16. Tpraneronxeros
  17. Iteration: 4000, Loss: 25.901815
  18. Mivrosaurus
  19. Inee
  20. Ivtroplisaurus
  21. Mbaaisaurus
  22. Wusichisaurus
  23. Cabaselachus
  24. Toraperlethosdarenitochusthiamamumamaon
  25. Iteration: 6000, Loss: 24.608779
  26. Onwusceomosaurus
  27. Lieeaerosaurus
  28. Lxussaurus
  29. Oma
  30. Xusteonosaurus
  31. Eeahosaurus
  32. Toreonosaurus
  33. Iteration: 8000, Loss: 24.070350
  34. Onxusichepriuon
  35. Kilabersaurus
  36. Lutrodon
  37. Omaaerosaurus
  38. Xutrcheps
  39. Edaksoje
  40. Trodiktonus
  41. Iteration: 10000, Loss: 23.844446
  42. Onyusaurus
  43. Klecalosaurus
  44. Lustodon
  45. Ola
  46. Xusodonia
  47. Eeaeosaurus
  48. Troceosaurus
  49. Iteration: 12000, Loss: 23.291971
  50. Onyxosaurus
  51. Kica
  52. Lustrepiosaurus
  53. Olaagrraiansaurus
  54. Yuspangosaurus
  55. Eealosaurus
  56. Trognesaurus
  57. Iteration: 14000, Loss: 23.382338
  58. Meutromodromurus
  59. Inda
  60. Iutroinatorsaurus
  61. Maca
  62. Yusteratoptititan
  63. Ca
  64. Troclosaurus
  65. Iteration: 16000, Loss: 23.268257
  66. Mbutosaurus
  67. Indaa
  68. Iustolophulurus
  69. Macagosaurus
  70. Yusoclichaurus
  71. Caahosaurus
  72. Trodon
  73. Iteration: 18000, Loss: 22.928870
  74. Phytrogiaps
  75. Mela
  76. Mustrha
  77. Pegamosaurus
  78. Ytromacisaurus
  79. Efanshie
  80. Troma
  81. Iteration: 20000, Loss: 23.008798
  82. Onyusperchohychus
  83. Lola
  84. Lytrranfosaurus
  85. Olaa
  86. Ytrrcharomulus
  87. Ehagosaurus
  88. Trrcharonyhus
  89. Iteration: 22000, Loss: 22.794515
  90. Onyvus
  91. Llecakosaurus
  92. Mustodonosaurus
  93. Ola
  94. Yusodon
  95. Eiadosaurus
  96. Trodontorus
  97. Iteration: 24000, Loss: 22.648635
  98. Meutosaurus
  99. Incaachudachus
  100. Itntodon
  101. Mecaessan
  102. Yurong
  103. Daadropachusaurus
  104. Troenatheusaurosaurus
  105. Iteration: 26000, Loss: 22.599152
  106. Nixusehoenomulushapnelspanthuonathitalia
  107. Jigaadroncansaurus
  108. Kustodonis
  109. Nedantrocantiteniupegyankuaeusalomarotimenmpangvin
  110. Ytrodongoluctos
  111. Eebdssaegoterichus
  112. Trodolopiunsitarbilus
  113. Iteration: 28000, Loss: 22.628455
  114. Pnywrodilosaurus
  115. Loca
  116. Mustodonanethosaurus
  117. Phabesceeatopsaurus
  118. Ytrodonnoludosaurus
  119. Elaishacaosaurus
  120. Trrdilosaurus
  121. Iteration: 30000, Loss: 22.587893
  122. Piusosaurus
  123. Locaadrus
  124. Lutosaurus
  125. Pacalosaurus
  126. Yusochesaurus
  127. Eg
  128. Trraodon
  129. Iteration: 32000, Loss: 22.314649
  130. Nivosaurus
  131. Jiacamisaurus
  132. Kusplasaurus
  133. Ncaadosaurus
  134. Yusiandon
  135. Eeaisilaanus
  136. Trokalenator
  137. Iteration: 34000, Loss: 22.445100
  138. Mewsroengosaurus
  139. Ilabafosaurus
  140. Justoeomimavesaurus
  141. Macaeosaurus
  142. Yrosaurus
  143. Eiaeosaurus
  144. Trodondolus

4. 写出莎士比亚风格的文字(Writing like Shakespeare)


  1. from __future__ import print_function
  2. from keras.callbacks import LambdaCallback
  3. from keras.models import Model, load_model, Sequential
  4. from keras.layers import Dense, Activation, Dropout, Input, Masking
  5. from keras.layers import LSTM
  6. from keras.utils.data_utils import get_file
  7. from keras.preprocessing.sequence import pad_sequences
  8. from shakespeare_utils import *
  9. import sys
  10. import io

Loading text data...

Creating training set...

number of training examples: 31412

Vectorizing training set...

Loading model...



  1. print_callback = LambdaCallback(on_epoch_end=on_epoch_end)
  2., y, batch_size=128, epochs=1, callbacks=[print_callback])

Epoch 1/1

31412/31412 [==============================] - 27s 846us/step - loss: 2.7274

  1. # Run this cell to try with different inputs without having to re-train the model
  2. generate_output()

rite the beginning of your poem, the Shakespeare machine will complete it. Your input is: Forsooth this maketh no sense

Here is your poem:

Forsooth this maketh no sense.

to that i his bongy of sacu, or when thee grace.

to peirout i have sweet from thee, ald the will,

in this, as thy dealt besich whereor me hall thy dould,

and thee and creasts of the cantensed site,

my heart which that a form and ridcus forsed:

for thy coneloting thy where hors of sich,

that prow'st and thincior with with now,

as makted for thou best, and parking frank,

it place corsack thas


  1. #------------用于绘制模型细节,可选--------------#
  2. from IPython.display import SVG
  3. from keras.utils.vis_utils import model_to_dot
  4. from keras.utils import plot_model
  5. %matplotlib inline
  6. plot_model(model, to_file='shakespeare.png')
  7. SVG(model_to_dot(model).create(prog='dot', format='svg'))
  8. #------------------------------------------------#

