Character level language model - Dinosaurus land

Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment they are back. You are in charge of a special task. Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go beserk, so choose wisely!

Luckily you have learned some deep learning and you will use it to save the day. Your assistant has collected a list of all the dinosaur names they could find, and compiled them into this dataset. (Feel free to take a look by clicking the previous link.) To create new dinosaur names, you will build a character level language model to generate new names. Your algorithm will learn the different name patterns, and randomly generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs' wrath!

By completing this assignment you will learn:

  • How to store text data for processing using an RNN
  • How to synthesize data, by sampling predictions at each time step and passing it to the next RNN-cell unit
  • How to build a character-level text generation recurrent neural network
  • Why clipping the gradients is important


字符级语言模型- 恐龙岛
欢迎来到 恐龙岛!6500万年前, 恐龙还存在, 在这个任务中, 他们又回来了。你负责一项特别任务。领先的生物学研究人员正在创造新品种的恐龙, 并把它们带到地球上, 你的工作是给这些恐龙命名。如果恐龙不喜欢它的名字, 它可能会 beserk, 所以请明智地选择!
幸运的是, 你已经学会了深度学习, 你会用它来拯救我们的生活。您的助手收集了他们可以找到的所有恐龙名称的列表, 并将它们编译成这个数据集。(通过单击链接可以随意查看。要创建新的恐龙名称, 您将构建一个字符级语言模型来生成新名称。您的算法将学习不同的名称模式, 并随机生成新名称。希望这个算法能让你和你的团队免于恐龙的愤怒!
通过完成此任务, 您将了解到:
  • 如何存储使用 RNN 处理的文本数据
  • 如何合成数据, 通过在每个时间步骤抽样预测结果并传递给下一个 RNN 单元
  • 如何构建字符级文本生成的递归神经网络
  • 为什么剪切梯度很重要

We will begin by loading in some functions that we have provided for you in rnn_utils. Specifically, you have access to functions such as rnn_forward and rnn_backward which are equivalent to those you've implemented in the previous assignment.


import numpy as np
from utils import *
import random

1 - Problem Statement

1.1 - Dataset and Preprocessing

Run the following cell to read the dataset of dinosaur names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size.


data = open('dinos.txt', 'r').read()
data= data.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))[]


There are 19909 total characters and 27 unique characters in your data.

The characters are a-z (26 characters) plus the "\n" (or newline character), which in this assignment plays a role similar to the <EOS> (or "End of sentence") token we had discussed in lecture, only here it indicates the end of the dinosaur name rather than the end of a sentence. In the cell below, we create a python dictionary (i.e., a hash table) to map each character to an index from 0-26. We also create a second python dictionary that maps each index back to the corresponding character . This will help you figure out what index corresponds to what character in the probability distribution output of the softmax layer. Below, char_to_ix and ix_to_char are the python dictionaries.


char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) }
ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) }


{0: '\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'}


运行以下单元格以读取恐龙名称的数据集, 创建唯一字符 (如 a-z) 的列表, 并计算数据集和词汇大小。


data = open('dinos.txt', 'r').read()
data= data.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))[]


There are 19909 total characters and 27 unique characters in your data.


字符是 a-z (26 个字符) 加上 "\n" (或换行符),  "\n" 在这个任务中扮演一个角色, 类似于我们在演讲中讨论过的 <EOS> (或句子结尾) 标志, 在这里它表明恐龙名字的结束而不是句子的结束。.在下面的单元格中, 我们创建一个 python 字典 (即哈希表), 将每个字符映射为0-26 的索引。我们还创建了另一个 python 字典, 将每个索引映射回相应的字符。这将帮助您找出哪个索引对应于 softmax 层的概率分布输出中的哪个字符。下面, char_to_ix 和 ix_to_char 是 python 字典。


char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) }
ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) }


{0: '\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'}


1.2 - Overview of the model

Your model will have the following structure:

  • Initialize parameters
  • Run the optimization loop
    • Forward propagation to compute the loss function
    • Backward propagation to compute the gradients with respect to the loss function
    • Clip the gradients to avoid exploding gradients
    • Using the gradients, update your parameter with the gradient descent update rule.
  • Return the learned parameters

Figure 1: Recurrent Neural Network, similar to what you had built in the previous notebook "Building a RNN - Step by Step".

At each time-step, the RNN tries to predict what is the next character given the previous characters. The dataset X=(x1,x2,...,xTx)is a list of characters in the training set, while Y=(y1,y2,...,yTx) is such that at every time-step tt, we have yt=x⟨t+1⟩. 


2 - Building blocks of the model

In this part, you will build two important blocks of the overall model:

  • Gradient clipping: to avoid exploding gradients
  • Sampling: a technique used to generate characters

You will then apply these two functions to build the model.


2.1 - Clipping the gradients in the optimization loop

In this section you will implement the clip function that you will call inside of your optimization loop. Recall that your overall loop structure usually consists of a forward pass, a cost computation, a backward pass, and a parameter update. Before updating the parameters, you will perform gradient clipping when needed to make sure that your gradients are not "exploding," meaning taking on overly large values.

In the exercise below, you will implement a function clip that takes in a dictionary of gradients and returns a clipped version of gradients if needed. There are different ways to clip gradients; we will use a simple element-wise clipping procedure, in which every element of the gradient vector is clipped to lie between some range [-N, N]. More generally, you will provide a maxValue (say 10). In this example, if any component of the gradient vector is greater than 10, it would be set to 10; and if any component of the gradient vector is less than -10, it would be set to -10. If it is between -10 and 10, it is left alone.

在本节中, 您将实现将在优化循环内调用的clip函数。请记住, 整个循环结构通常由前向传播、成本计算、反向传播和参数更新组成。在更新参数之前, 您将在需要时执行梯度修剪, 以确保梯度不 "爆炸", 这意味着对超大值进行处理。
在下面的练习中, 您将实现一个函数clip, 该clip对梯度的字典进行处理, 如果需要, 返回梯度的修剪版本。有不同的方法来修剪梯度;我们将使用一个简单的element-wise程序, 其中的每个元素的梯度向量被修剪在一些范围 [-N, N]。更笼统地说, 你将提供一个 maxValue (比方说 10)。在这个例子中, 如果梯度向量的任何分量大于 10, 它将被设置为 10;如果梯度向量的任何分量小于-10, 它将被设置为-10。如果它是在-10 和10之间, 它是单独留下的。

 Figure 2: Visualization of gradient descent with and without gradient clipping, in a case where the network is running into slight "exploding gradient" problems.
Exercise: Implement the function below to return the clipped gradients of your dictionary gradients. Your function takes in a maximum threshold and returns the clipped versions of your gradients. You can check out this hint for examples of how to clip in numpy. You will need to use the argument out = ....

def clip(gradients, maxValue):
Clips the gradients' values between minimum and maximum. Arguments:
gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue Returns:
gradients -- a dictionary with the clipped gradients.
''' dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby'] ### START CODE HERE ###
# clip to mitigate(减少) exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)
for gradient in [dWax, dWaa, dWya, db, dby]:
np.clip(gradient,-maxValue,maxValue,gradient) # numpy.clip(a, a_min, a_max, out=None)
### END CODE HERE ### gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby} return gradients
dWax = np.random.randn(5,3)*10
dWaa = np.random.randn(5,5)*10
dWya = np.random.randn(2,5)*10
db = np.random.randn(5,1)*10
dby = np.random.randn(2,1)*10
gradients = {"dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby}
gradients = clip(gradients, 10)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("gradients[\"dWax\"][3][1] =", gradients["dWax"][3][1])
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])


gradients["dWaa"][1][2] = 10.0
gradients["dWax"][3][1] = -10.0
gradients["dWya"][1][2] = 0.29713815361
gradients["db"][4] = [ 10.]
gradients["dby"][1] = [ 8.45833407]

Expected output】  

gradients["dWaa"][1][2]	10.0
gradients["dWax"][3][1] -10.0
gradients["dWya"][1][2] 0.29713815361
gradients["db"][4] [ 10.]
gradients["dby"][1] [ 8.45833407]


2.2 - Sampling

Now assume that your model is trained. You would like to generate new text (characters). The process of generation is explained in the picture below:

Figure 3: In this picture, we assume the model is already trained. We pass in x1⟩=0⃗  at the first time step, and have the network then sample one character at a time.


Exercise: Implement the sample function below to sample characters. You need to carry out 4 steps: 
Step 1: Pass the network the first "dummy" input x1=0⃗  (the vector of zeros). This is the default input before we've generated any characters. We also set a0=0⃗ 
Step 2: Run one step of forward propagation to get a⟨1⟩ and ŷ ⟨1⟩. Here are the equations: 

Note that ŷ ⟨t+1⟩ is a (softmax) probability vector (its entries are between 0 and 1 and sum to 1). ŷ it+1 represents the probability that the character indexed by "i" is the next character. We have provided a softmax() function that you can use.

Step 3: Carry out sampling: Pick the next character's index according to the probability distribution specified by ŷ ⟨t+1⟩. This means that if ŷ it+1i=0.16, you will pick the index "i" with 16% probability. To implement it, you can use np.random.choice.

Here is an example of how to use np.random.choice():

p = np.array([0.1, 0.0, 0.7, 0.2])
index = np.random.choice([0, 1, 2, 3], p = p.ravel())

This means that you will pick the index according to the distribution: P(index=0)=0.1,P(index=1)=0.0,P(index=2)=0.7,P(index=3)=0.2.  

Step 4: The last step to implement in sample() is to overwrite the variable x, which currently stores x⟨t⟩, with the value of x⟨t+1⟩. You will represent x⟨t+1⟩ by creating a one-hot vector corresponding to the character you've chosen as your prediction. You will then forward propagate x⟨t+1⟩ in Step 1 and keep repeating the process until you get a "\n" character, indicating you've reached the end of the dinosaur name.


练习: 在下面执行sample 函数以采样字符。你需要执行4步:
步骤 1: 给网络输入第一个 "虚拟" 向量x1=0⃗ (零向量)。这是在生成任何字符之前的默认输入。我们还设置了a0=0⃗ 

步骤 2: 运行向前传播一步以获得 a⟨1⟩ 和 ŷ ⟨1⟩。下面是等式:

请注意,  ŷ ⟨t+1⟩is a (softmax) 概率向量 (其值介于0和1之间, 总和为 1)。ŷ it+1 代表 由 "i" 索引的字符是下一个字符的概率。我们提供了一个 softmax () 函数, 您可以使用它。 
步骤 3: 进行抽样: 根据ŷ ⟨t+1⟩指定的概率分布, 选取下一个字符的索引。这意味着, 如果ŷ it+1i= 0.16, 您将选择索引 "i" 以16% 的可能性。要实现它, 您可以使用  np.random.choice
 下面是一个如何使用 np.random.choice () 的示例:
p = np.array([0.1, 0.0, 0.7, 0.2])
index = np.random.choice([0, 1, 2, 3], p = p.ravel())

这意味着您将根据分布选择索引: P(index=0)=0.1,P(index=1)=0.0,P(index=2)=0.7,P(index=3)=0.2。  

步骤 4: 在sample()中实现的最后一个步骤是覆盖当前存储 x⟨t⟩的变量 x, 其值为 x⟨t+1 ⟩。您将代表 x⟨t+1 ⟩,通过创建一个与您选择的字符相对应的one-hot向量。然后, 您将在步骤1前向传播 x⟨t+1 ⟩并继续重复该过程, 直到您得到一个 "\n" 字符, 表明您已经达到恐龙名称的末尾。 

def sample(parameters, char_to_ix, seed):
Sample a sequence of characters according to a sequence of probability distributions output of the RNN Arguments:
parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b.
char_to_ix -- python dictionary mapping each character to an index.
seed -- used for grading purposes. Do not worry about it. Returns:
indices -- a list of length n containing the indices of the sampled characters.
""" # Retrieve parameters and relevant shapes from "parameters" dictionary
Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
vocab_size = by.shape[0] # vocab_size 指字典的大小
n_a = Waa.shape[1] ### START CODE HERE ###
# Step 1: Create the one-hot vector x for the first character (initializing the sequence generation). (≈1 line)
x = np.zeros((vocab_size,1)) # x 是 one hot 向量
# Step 1': Initialize a_prev as zeros (≈1 line)
a_prev = np.zeros((n_a,1)) # a_prev 是 (n_a,1) 维向量 # Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate (≈1 line)
indices = [] # Idx is a flag to detect a newline character, we initialize it to -1
idx = -1 # Loop over time-steps t. At each time-step, sample a character from a probability distribution and append
# its index to "indices". We'll stop if we reach 50 characters (which should be very unlikely with a well
# trained model), which helps debugging and prevents entering an infinite loop.
counter = 0
newline_character = char_to_ix['\n'] while (idx != newline_character and counter != 50): # Step 2: Forward propagate x using the equations (1), (2) and (3)
a = np.tanh(,x),a_prev)+b)
z =,a)+by
y = softmax(z) # for grading purposes
np.random.seed(counter+seed) # Step 3: Sample the index of a character within the vocabulary from the probability distribution y
# np.arange(vocab_size) 返回一个一维数组,即[0,1,...,vocab_size]
# np.random.choice(vocab_size,p=y.ravel()) 等价于 np.random.choice([0,1,...,vocab_size],p=y.ravel())
idx = np.random.choice(vocab_size,p=y.ravel()) # Append the index to "indices"
indices.append(idx) # Step 4: Overwrite the input character as the one corresponding to the sampled index.
# 根据取样索引值修改x,即将索引对应的one hot向量的位置值改为1
x = np.zeros((vocab_size,1))
x[idx] = 1 # Update "a_prev" to be "a"
a_prev = a # for grading purposes
seed += 1
counter +=1 ### END CODE HERE ### if (counter == 50):
indices.append(char_to_ix['\n']) return indices
_, n_a = 20, 100
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by} indices = sample(parameters, char_to_ix, 0)
print("list of sampled indices:", indices)
print("list of sampled characters:", [ix_to_char[i] for i in indices])


list of sampled indices: [12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24, 7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 5, 6, 12, 25, 0, 0]
list of sampled characters: ['l', 'q', 'x', 'n', 'm', 'i', 'j', 'v', 'x', 'f', 'm', 'k', 'l', 'f', 'u', 'o', 'u', 'n', 'c', 'b', 'a', 'u', 'r', 'x', 'g', 'y', 'f', 'y', 'r', 'j', 'p', 'b', 'c', 'h', 'o', 'l', 'k', 'g', 'a', 'l', 'j', 'b', 'g', 'g', 'k', 'e', 'f', 'l', 'y', '\n', '\n']

Expected output】  

list of sampled indices:	[12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24,
7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 5, 6, 12, 25, 0, 0]
list of sampled characters: ['l', 'q', 'x', 'n', 'm', 'i', 'j', 'v', 'x', 'f', 'm', 'k', 'l', 'f', 'u', 'o',
'u', 'n', 'c', 'b', 'a', 'u', 'r', 'x', 'g', 'y', 'f', 'y', 'r', 'j', 'p', 'b', 'c', 'h', 'o',
'l', 'k', 'g', 'a', 'l', 'j', 'b', 'g', 'g', 'k', 'e', 'f', 'l', 'y', '\n', '\n']


3 - Building the language model

It is time to build the character-level language model for text generation.

3.1 - Gradient descent

In this section you will implement a function performing one step of stochastic gradient descent (with clipped gradients). You will go through the training examples one at a time, so the optimization algorithm will be stochastic gradient descent. As a reminder, here are the steps of a common optimization loop for an RNN:

  • Forward propagate through the RNN to compute the loss
  • Backward propagate through time to compute the gradients of the loss with respect to the parameters
  • Clip the gradients if necessary
  • Update your parameters using gradient descent


在本节中, 您将实现一个函数, 执行随机梯度下降的一个步骤 (带有梯度修剪)。您将一次采用一个训练样本训练, 因此优化算法将是随机梯度下降。作为提醒, 下面是 RNN 的通用优化循环的步骤:
  • 通过 RNN 向前传播计算损失
  • 通过时间反向传播计算损失对于关于参数的梯度
  • 必要时修剪梯度
  • 使用渐梯度下降更新参数

Exercise: Implement this optimization process (one step of stochastic gradient descent).  

def rnn_forward(X, Y, a_prev, parameters):
""" Performs the forward propagation through the RNN and computes the cross-entropy loss.
It returns the loss' value as well as a "cache" storing values to be used in the backpropagation."""
return loss, cache def rnn_backward(X, Y, parameters, cache):
""" Performs the backward propagation through time to compute the gradients of the loss with respect
to the parameters. It returns also all the hidden states."""
return gradients, a def update_parameters(parameters, gradients, learning_rate):
""" Updates parameters using the Gradient Descent Update Rule."""
return parameters




def optimize(X, Y, a_prev, parameters, learning_rate = 0.01):
Execute one step of the optimization to train the model. Arguments:
X -- list of integers, where each integer is a number that maps to a character in the vocabulary.
Y -- list of integers, exactly the same as X but shifted one index to the left.(整数列表, 与 X 完全相同, 但向左移动了一个索引)
a_prev -- previous hidden state.
parameters -- python dictionary containing:
Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
b -- Bias, numpy array of shape (n_a, 1)
by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
learning_rate -- learning rate for the model. Returns:
loss -- value of the loss function (cross-entropy)
gradients -- python dictionary containing:
dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)
db -- Gradients of bias vector, of shape (n_a, 1)
dby -- Gradients of output bias vector, of shape (n_y, 1)
a[len(X)-1] -- the last hidden state, of shape (n_a, 1)
""" ### START CODE HERE ### # Forward propagate through time (≈1 line)
loss, cache =rnn_forward(X, Y, a_prev, parameters) # Backpropagate through time (≈1 line)
gradients, a = rnn_backward(X, Y, parameters, cache) # Clip your gradients between -5 (min) and 5 (max) (≈1 line)
gradients = clip(gradients,5) # Update parameters (≈1 line)
parameters = update_parameters(parameters, gradients, learning_rate) ### END CODE HERE ### return loss, gradients, a[len(X)-1]
vocab_size, n_a = 27, 100
a_prev = np.random.randn(n_a, 1)
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a) # 输入x是一个one hot向量,[1,vocab_size]
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
X = [12,3,5,11,22,3]
Y = [4,14,11,22,25, 26] loss, gradients, a_last = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
print("Loss =", loss)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("np.argmax(gradients[\"dWax\"]) =", np.argmax(gradients["dWax"]))
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])
print("a_last[4] =", a_last[4])


Loss = 126.503975722
gradients["dWaa"][1][2] = 0.194709315347
np.argmax(gradients["dWax"]) = 93
gradients["dWya"][1][2] = -0.007773876032
gradients["db"][4] = [-0.06809825]
gradients["dby"][1] = [ 0.01538192]
a_last[4] = [-1.]

Expected output】  

Loss	126.503975722
gradients["dWaa"][1][2] 0.194709315347
np.argmax(gradients["dWax"]) 93
gradients["dWya"][1][2] -0.007773876032
gradients["db"][4] [-0.06809825]
gradients["dby"][1] [ 0.01538192]
a_last[4] [-1.]

3.2 - Training the model

Given the dataset of dinosaur names, we use each line of the dataset (one name) as one training example. Every 100 steps of stochastic gradient descent, you will sample 10 randomly chosen names to see how the algorithm is doing. Remember to shuffle the dataset, so that stochastic gradient descent visits the examples in random order.


对于恐龙名称的数据集, 我们使用数据集的每一行 (一个名称) 作为一个训练样本。每100步随机梯度下降, 你会抽样10个随机选择的名称, 以了解如何做的算法。记得洗牌数据集, 使随机梯度下降以随机顺序访问样本。

Exercise: Follow the instructions and implement model(). When examples[index] contains one dinosaur name (string), to create an example (X, Y), you can use this:  

index = j % len(examples)
X = [None] + [char_to_ix[ch] for ch in examples[index]]
Y = X[1:] + [char_to_ix["\n"]]

Note that we use: index= j % len(examples), where j = 1....num_iterations, to make sure that examples[index] is always a valid statement (index is smaller than len(examples)). The first entry of X being None will be interpreted by rnn_forward() as setting x⟨0⟩=0⃗ . Further, this ensures that Y is equal to X but shifted one step to the left, and with an additional "\n" appended to signify the end of the dinosaur name.  



def model(data, ix_to_char, char_to_ix, num_iterations = 35000, n_a = 50, dino_names = 7, vocab_size = 27):
Trains the model and generates dinosaur names. Arguments:
data -- text corpus
ix_to_char -- dictionary that maps the index to a character
char_to_ix -- dictionary that maps a character to an index
num_iterations -- number of iterations to train the model for
n_a -- number of units of the RNN cell
dino_names -- number of dinosaur names you want to sample at each iteration.
vocab_size -- number of unique characters found in the text, size of the vocabulary Returns:
parameters -- learned parameters
""" # Retrieve n_x and n_y from vocab_size
n_x, n_y = vocab_size, vocab_size # Initialize parameters
parameters = initialize_parameters(n_a, n_x, n_y) # Initialize loss (this is required because we want to smooth our loss, don't worry about it)
loss = get_initial_loss(vocab_size, dino_names) # Build list of all dinosaur names (training examples).
with open("dinos.txt") as f:
examples = f.readlines()
examples = [x.lower().strip() for x in examples] # Shuffle list of all dinosaur names
np.random.shuffle(examples) # Initialize the hidden state of your LSTM
a_prev = np.zeros((n_a, 1)) # Optimization loop
for j in range(num_iterations): ### START CODE HERE ### # Use the hint above to define one training example (X,Y) (≈ 2 lines)
index = j % len(examples)
X = [None] + [char_to_ix[ch] for ch in examples[index]]
Y = X[1:] + [char_to_ix["\n"]] # Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters
# Choose a learning rate of 0.01
curr_loss, gradients, a_prev = optimize(X, Y, a_prev, parameters, learning_rate = 0.01) ### END CODE HERE ### # Use a latency trick(延时技巧) to keep the loss smooth. It happens here to accelerate the training.
loss = smooth(loss, curr_loss) # Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model is learning properly
if j % 2000 == 0: print('Iteration: %d, Loss: %f' % (j, loss) + '\n') # The number of dinosaur names to print
seed = 0
for name in range(dino_names): # Sample indices and print them
sampled_indices = sample(parameters, char_to_ix, seed) # 采样一次,生成一个恐龙的名字
print_sample(sampled_indices, ix_to_char) seed += 1 # To get the same result for grading purposed, increment the seed by one. print('\n') return parameters

Run the following cell, you should observe your model outputting random-looking characters at the first iteration. After a few thousand iterations, your model should learn to generate reasonable-looking names.  



parameters = model(data, ix_to_char, char_to_ix)


Iteration: 0, Loss: 23.087336

Xwtdmfqoeyhsqwasjkjvu Iteration: 2000, Loss: 27.884160 Liusskeomnolxeros
Tpraneronxeros Iteration: 4000, Loss: 25.901815 Mivrosaurus
Toraperlethosdarenitochusthiamamumamaon Iteration: 6000, Loss: 24.608779 Onwusceomosaurus
Toreonosaurus Iteration: 8000, Loss: 24.070350 Onxusichepriuon
Trodiktonus Iteration: 10000, Loss: 23.844446 Onyusaurus
Troceosaurus Iteration: 12000, Loss: 23.291971 Onyxosaurus
Trognesaurus Iteration: 14000, Loss: 23.382339 Meutromodromurus
Troclosaurus Iteration: 16000, Loss: 23.288447 Meuspsangosaurus
Trpandon Iteration: 18000, Loss: 22.823526 Phytrolonhonyg
Trolomeehus Iteration: 20000, Loss: 23.041871 Nousmofonosaurus
Troenchulunosaurus Iteration: 22000, Loss: 22.728849 Piutyrangosaurus
Trodoniomusitocorces Iteration: 24000, Loss: 22.683403 Meutromeisaurus
Trodonasaurus Iteration: 26000, Loss: 22.554523 Phyusaurus
Trodontonsaurus Iteration: 28000, Loss: 22.484472 Onyutimaerihus
Trofiashates Iteration: 30000, Loss: 22.774404 Phytys
Trochesaurus Iteration: 32000, Loss: 22.209473 Mawusaurus
Trnanatrax Iteration: 34000, Loss: 22.396744 Mavptokekus



You can see that your algorithm has started to generate plausible dinosaur names towards the end of the training. At first, it was generating random characters, but towards the end you could see dinosaur names with cool endings. Feel free to run the algorithm even longer and play with hyperparameters to see if you can get even better results. Our implemetation generated some really cool names like maconuconmarloralus and macingsersaurus. Your model hopefully also learned that dinosaur names tend to end in saurusdonaurator, etc.

If your model generates some non-cool names, don't blame the model entirely--not all actual dinosaur names sound cool. (For example, dromaeosauroides is an actual dinosaur name and is in the training set.) But this model should give you a set of candidates from which you can pick the coolest!

This assignment had used a relatively small dataset, so that you could train an RNN quickly on a CPU. Training a model of the english language requires a much bigger dataset, and usually needs much more computation, and could run for many hours on GPUs. We ran our dinosaur name for quite some time, and so far our favoriate name is the great, undefeatable, and fierce: Mangosaurus!


你可以看到, 你的算法已经开始生成合理的恐龙名称, 在训练结束。起初, 它是产生随机字符, 但到最后, 你可以看到恐龙的名字。可以自由地运行算法更长的时间, 修改 hyperparameters, 看看你是否能得到更好的结果。我们的实现产生了一些非常酷的名字, 如 maconucon, marloralus 和 macingsersaurus。希望你的模型也知道恐龙的名字,这些名字往往以saurusdonaurator等结束。
如果你的模型产生一些不酷的名字,完全不要责怪模型-不是所有实际的恐龙名字听起来很酷。(例如, dromaeosauroides 是一个实际的恐龙名称, 并且在训练集中。但这个模型应该给你一组候选名单, 你可以选择最酷的!
此任务使用了相对较小的数据集, 以便您可以在 CPU 上快速地训练 RNN。训练英语语言模型需要更大的数据集, 通常需要更多的计算, 并且有可能在 GPUs 上运行许多小时。我们的恐龙名字运行了相当一段时间, 到目前为止, 我们的爱好名字是伟大的, 无敌, 和激烈的: Mangosaurus!


4 - Writing like Shakespeare

The rest of this notebook is optional and is not graded, but we hope you'll do it anyway since it's quite fun and informative.

A similar (but more complicated) task is to generate Shakespeare poems. Instead of learning from a dataset of Dinosaur names you can use a collection of Shakespearian poems. Using LSTM cells, you can learn longer term dependencies that span many characters in the text--e.g., where a character appearing somewhere a sequence can influence what should be a different character much much later in ths sequence. These long term dependencies were less important with dinosaur names, since the names were quite short.



本笔记本的其余部分是可选的, 并没有评分, 但我们希望你会这样做, 因为它是相当有趣和信息的。
类似的 (但更复杂的) 任务是生成莎士比亚的诗歌。这次不是从恐龙名字的数据集学习, 而是你可以使用莎士比亚诗的集合。使用 LSTM 单元格, 您可以学习跨越文本中许多字符的更长的长期依赖关系, 例如, 在某个序列中出现的某个字符可能会影响该序列中稍后的不同字符。对于恐龙名字,这些长期依赖性不太重要与, 因为名字是相当短的
We have implemented a Shakespeare poem generator with Keras. Run the following cell to load the required packages and models. This may take a few minutes.
from __future__ import print_function
from keras.callbacks import LambdaCallback
from keras.models import Model, load_model, Sequential
from keras.layers import Dense, Activation, Dropout, Input, Masking
from keras.layers import LSTM
from keras.utils.data_utils import get_file
from keras.preprocessing.sequence import pad_sequences
from shakespeare_utils import *
import sys
import io


Loading text data...
Creating training set...
number of training examples: 31412
Vectorizing training set...
Loading model...


To save you some time, we have already trained a model for ~1000 epochs on a collection of Shakespearian poems called "The Sonnets".


Let's train the model for one more epoch. When it finishes training for an epoch---this will also take a few minutes---you can run generate_output, which will prompt asking you for an input (<40 characters). The poem will start with your sentence, and our RNN-Shakespeare will complete the rest of the poem for you! For example, try "Forsooth this maketh no sense " (don't enter the quotation marks). Depending on whether you include the space at the end, your results might also differ--try it both ways, and try other inputs as well.


print_callback = LambdaCallback(on_epoch_end=on_epoch_end), y, batch_size=128, epochs=1, callbacks=[print_callback])


Epoch 1/1
31412/31412 [==============================] - 213s - loss: 2.5632
<keras.callbacks.History at 0x7f5469add400>



# Run this cell to try with different inputs without having to re-train the model


Write the beginning of your poem, the Shakespeare machine will complete it. Your input is: Forsooth this maketh no sense

Here is your poem: 

Forsooth this maketh no sense,
phore sanrel maspy to danciging,
and make that woer oh (treased's from fro ly.
if least to me the suffertife of feer by caosed,
hid trolse fritce dedibe the word the miget,
buf my leass were comfoss that in thou hant'st gaod,
his shade the wilf thit whete spool my sade.
cince switt wat pen swalce on thee thee de to yout chasse?
bes it she might all most do thi ale agay.
but lose my 'stain shull


The RNN-Shakespeare model is very similar to the one you have built for dinosaur names. The only major differences are:

  • LSTMs instead of the basic RNN to capture longer-range dependencies
  • The model is a deeper, stacked LSTM model (2 layer)
  • Using Keras instead of python to simplify the code

If you want to learn more, you can also check out the Keras Team's text generation implementation on GitHub:

Congratulations on finishing this notebook!




