Sequence Models and Long-Short Term Memory Networks
- LSTM’s in Pytorch
- Example: An LSTM for Part-of-Speech Tagging
- Exercise: Augmenting the LSTM part-of-speech tagger with character-level features
Sequence models are central to NLP: they are models where there is some sort of dependence through time between your inputs. The classical example of a sequence model is the Hidden Markov Model for part-of-speech tagging. Another example is the conditional random field.
LSTM’s in Pytorch
Pytorch’s LSTM expects all of its inputs to be 3D tensors. The semantics of the axes of these tensors is important. The first axis is the sequence itself, the second indexes instances in the mini-batch, and the third indexes elements of the input. We haven’t discussed mini-batching, so lets just ignore that and assume we will always have just 1 dimension on the second axis. If we want to run the sequence model over the sentence “The cow jumped”, our input should look like
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim torch.manual_seed(1) lstm=nn.LSTM(3,3) #Input dim is 3, output dim is 3
inputs = [torch.randn(1, 3) for _ in range(5)] # make a sequence of length 5 # initialize the hidden state.
hidden = (torch.randn(1, 1, 3),
torch.randn(1, 1, 3)) for i in inputs:
out,hidden=lstm(i.view(1,1,-1),hidden) inputs=torch.cat(inputs).view(len(inputs),1,-1)
hidden=(torch.randn(1,1,3),torch.randn(1,1,3))
out,hidden=lstm(inputs,hidden)
print(out)
print(hidden)
Example: An LSTM for Part-of-Speech Tagging
Prepare data:
def prepare_sequence(seq, to_ix):
idxs = [to_ix[w] for w in seq]
return torch.tensor(idxs, dtype=torch.long) training_data = [
("The dog ate the apple".split(), ["DET", "NN", "V", "DET", "NN"]),
("Everybody read that book".split(), ["NN", "V", "DET", "NN"])
]
word_to_ix = {}
for sent, tags in training_data:
for word in sent:
if word not in word_to_ix:
word_to_ix[word] = len(word_to_ix)
print(word_to_ix)
tag_to_ix = {"DET": 0, "NN": 1, "V": 2} # These will usually be more like 32 or 64 dimensional.
# We will keep them small, so we can see how the weights change as we train.
EMBEDDING_DIM = 6
HIDDEN_DIM = 6
Create the model:
class LSTMTagger(nn.Module): def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
super(LSTMTagger, self).__init__()
self.hidden_dim = hidden_dim self.word_embeddings = nn.Embedding(vocab_size, embedding_dim) # The LSTM takes word embeddings as inputs, and outputs hidden states
# with dimensionality hidden_dim.
self.lstm = nn.LSTM(embedding_dim, hidden_dim) # The linear layer that maps from hidden state space to tag space
self.hidden2tag = nn.Linear(hidden_dim, tagset_size)
self.hidden = self.init_hidden() def init_hidden(self):
# Before we've done anything, we dont have any hidden state.
# Refer to the Pytorch documentation to see exactly
# why they have this dimensionality.
# The axes semantics are (num_layers, minibatch_size, hidden_dim)
return (torch.zeros(1, 1, self.hidden_dim),
torch.zeros(1, 1, self.hidden_dim)) def forward(self, sentence):
embeds = self.word_embeddings(sentence)
lstm_out, self.hidden = self.lstm(
embeds.view(len(sentence), 1, -1), self.hidden)
tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))
tag_scores = F.log_softmax(tag_space, dim=1)
return tag_scores
Train the model:
model = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix), len(tag_to_ix))
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1) # See what the scores are before training
# Note that element i,j of the output is the score for tag j for word i.
# Here we don't need to train, so the code is wrapped in torch.no_grad()
with torch.no_grad():
inputs = prepare_sequence(training_data[0][0], word_to_ix)
tag_scores = model(inputs)
print(tag_scores) for epoch in range(300): # again, normally you would NOT do 300 epochs, it is toy data
for sentence, tags in training_data:
# Step 1. Remember that Pytorch accumulates gradients.
# We need to clear them out before each instance
model.zero_grad() # Also, we need to clear out the hidden state of the LSTM,
# detaching it from its history on the last instance.
model.hidden = model.init_hidden() # Step 2. Get our inputs ready for the network, that is, turn them into
# Tensors of word indices.
sentence_in = prepare_sequence(sentence, word_to_ix)
targets = prepare_sequence(tags, tag_to_ix) # Step 3. Run our forward pass.
tag_scores = model(sentence_in) # Step 4. Compute the loss, gradients, and update the parameters by
# calling optimizer.step()
loss = loss_function(tag_scores, targets)
loss.backward()
optimizer.step() # See what the scores are after training
with torch.no_grad():
inputs = prepare_sequence(training_data[0][0], word_to_ix)
tag_scores = model(inputs) # The sentence is "the dog ate the apple". i,j corresponds to score for tag j
# for word i. The predicted tag is the maximum scoring tag.
# Here, we can see the predicted sequence below is 0 1 2 0 1
# since 0 is index of the maximum value of row 1,
# 1 is the index of maximum value of row 2, etc.
# Which is DET NOUN VERB DET NOUN, the correct sequence!
print(tag_scores)
Sequence Models and Long-Short Term Memory Networks的更多相关文章
- LSTM学习—Long Short Term Memory networks
原文链接:https://colah.github.io/posts/2015-08-Understanding-LSTMs/ Understanding LSTM Networks Recurren ...
- LSTM(Long Short Term Memory)
长时依赖是这样的一个问题,当预测点与依赖的相关信息距离比较远的时候,就难以学到该相关信息.例如在句子”我出生在法国,……,我会说法语“中,若要预测末尾”法语“,我们需要用到上下文”法国“.理论上,递归 ...
- [C5W1] Sequence Models - Recurrent Neural Networks
第一周 循环序列模型(Recurrent Neural Networks) 为什么选择序列模型?(Why Sequence Models?) 在本课程中你将学会序列模型,它是深度学习中最令人激动的内容 ...
- Sequence Models
Sequence Models This is the fifth and final course of the deep learning specialization at Coursera w ...
- [C7] Andrew Ng - Sequence Models
About this Course This course will teach you how to build models for natural language, audio, and ot ...
- Sequence Models 笔记(一)
1 Recurrent Neural Networks(循环神经网络) 1.1 序列数据 输入或输出其中一个或两个是序列构成.例如语音识别,自然语言处理,音乐生成,感觉分类,dna序列,机器翻译,视频 ...
- 《Sequence Models》课堂笔记
Lesson 5 Sequence Models 这篇文章其实是 Coursera 上吴恩达老师的深度学习专业课程的第五门课程的课程笔记. 参考了其他人的笔记继续归纳的. 符号定义 假如我们想要建立一 ...
- 吴恩达《深度学习》-第五门课 序列模型(Sequence Models)-第一周 循环序列模型(Recurrent Neural Networks) -课程笔记
第一周 循环序列模型(Recurrent Neural Networks) 1.1 为什么选择序列模型?(Why Sequence Models?) 1.2 数学符号(Notation) 这个输入数据 ...
- 课程五(Sequence Models),第三周(Sequence models & Attention mechanism) —— 1.Programming assignments:Neural Machine Translation with Attention
Neural Machine Translation Welcome to your first programming assignment for this week! You will buil ...
随机推荐
- web项目的WEB-INF目录
WEB-INF是Java的WEB应用的安全目录.所谓安全就是客户端无法访问,只有服务端可以访问的目录. 如果想在页面中直接访问其中的文件,必须通过web.xml文件对要访问的文件进行相应映射才能访问. ...
- 小强的HTML5移动开发之路(32)—— JavaScript回顾7
BOM模型brower object model(浏览器对象模型),通过浏览器内置的一些对象可以操作浏览器本身. DOM是用来操作页面的,BOM是用来操作浏览器本身的. BOM是没有规范的,但是大部分 ...
- QQ欢乐豆斗地主心得体会(四):合谋赢豆
刚刚又在玩QQ欢乐斗地主,只可惜,这次不够欢乐. 本金,300万豆,运气比较好,赢到将近400万.突然,形势急转直下,一直输,一直到180多万豆.本来这一局,有硬炸在手,但是没有癞子,基本被吊打的节奏 ...
- Filter,Listener(转)
一.Filter的功能filter功能,它使用户可以改变一个 request和修改一个response. Filter 不是一个servlet,它不能产生一个response,它能够在一个reques ...
- AR Drone系列之:使用ROS catkin创建package并使用cv_bridge实现对ar drone摄像头数据的处理
1 开发环境 Ubuntu 12.04 ROS Hydro 2 前提 可參考这篇blog:http://blog.csdn.net/yake827/article/details/44564057 b ...
- DownLoadManager[20530:228829] DiskImageCache: Could not resolve the absolute path of the old directory.
uiwebview 模拟器打开PDF文件时崩溃.报下面错误,还不知道为什么 DownLoadManager[20530:228829] DiskImageCache: Could not resolv ...
- Vue Router的官方示例改造
基于Vue Router 2018年8月的官方文档示例,改造一下,通过一个最简单的例子,解决很多初学者的一个困惑. 首先是官方文档示例代码 <!DOCTYPE html> <html ...
- git如何更新fork的repository(Fork一个别人的repository,做了一些改动,再合并别人的更新)
Fork一个别人的repository,做了一些改动,想提交pull request的时候,发现原先别人的repository已经又有了一些更新了,这个时候想使得自己fork出的repository也 ...
- 【17.76%】【codeforces round 382C】Tennis Championship
time limit per test2 seconds memory limit per test256 megabytes inputstandard input outputstandard o ...
- WPF 判断调用方法堆栈
原文:WPF 判断调用方法堆栈 版权声明:博客已迁移到 http://lindexi.gitee.io 欢迎访问.如果当前博客图片看不到,请到 http://lindexi.gitee.io 访问博客 ...