pytorch-- Attention Mechanism

1. paper: Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Encoder

　　每个时刻输入一个词，隐藏层状态根据公式ht=f(ht−1,xt)改变。其中激活函数f可以是sigmod,tanh,ReLU,sotfplus等。
　　读完序列的每一个词之后，会得到一个固定长度向量c=tanh(VhN)
Decoder

　　由结构图可以看出，t时刻的隐藏层状态ht由ht−1,yt−1,c决定：ht=f(ht−1,yt−1,c)，其中h0=tanh(V′c)
　　最后的输出yt是由ht,yt−1,c决定
　　P=(yt|yt−1,yt−2,...,y1,c)=g(ht,yt−1,c)

以上,f,gf,g都是激活函数，其中g一般是softmax

对此我在pytoch环境下进行实现seq2seq最初版的模型：

(参考：https://github.com/graykode/nlp-tutorial)

 import numpy as np

 import torch

 import torch.nn as nn

 from torch.autograd import Variable

 dtype = torch.FloatTensor

 # S: Symbol that shows starting of decoding input

 # E: Symbol that shows ending of decoding output

 # P: Symbol that will fill in blank sequence if current batch data size is short than time steps

 char_arr = [c for c in 'SEPabcdefghijklmnopqrstuvwxyz']

 num_dic = {n: i for i, n in enumerate(char_arr)}

 seq_data = [['man', 'women'], ['black', 'white'], ['king', 'queen'], ['girl', 'boy'], ['up', 'down'], ['high', 'low']]

 # Seq2Seq Parameter

 n_step = 5

 n_hidden = 128

 n_class = len(num_dic)    #

 batch_size = len(seq_data)    #

 def make_batch(seq_data):

     input_batch, output_batch, target_batch = [], [], []

     for seq in seq_data:

         for i in range(2):

             seq[i] = seq[i] + 'P' * (n_step - len(seq[i]))

         input = [num_dic[n] for n in seq[0]]

         output = [num_dic[n] for n in ('S' + seq[1])]

         target = [num_dic[n] for n in (seq[1] + 'E')]

         input_batch.append(np.eye(n_class)[input])

         output_batch.append(np.eye(n_class)[output])

         target_batch.append(target) # not one-hot

     # make tensor

     return Variable(torch.Tensor(input_batch)), Variable(torch.Tensor(output_batch)), Variable(torch.LongTensor(target_batch))

 # Model

 class Seq2Seq(nn.Module):

     def __init__(self):

         super(Seq2Seq, self).__init__()

         self.enc_cell = nn.RNN(input_size=n_class, hidden_size=n_hidden, dropout=0.5)

         self.dec_cell = nn.RNN(input_size=n_class, hidden_size=n_hidden, dropout=0.5)

         self.fc = nn.Linear(n_hidden, n_class)

     def forward(self, enc_input, enc_hidden, dec_input):

         enc_input = enc_input.transpose(0, 1) # enc_input: [max_len(=n_step, time step), batch_size, n_class]

         dec_input = dec_input.transpose(0, 1) # dec_input: [max_len(=n_step, time step), batch_size, n_class]

         # enc_states : [num_layers(=1) * num_directions(=1), batch_size, n_hidden]

         _, enc_states = self.enc_cell(enc_input, enc_hidden)

         # outputs : [max_len+1(=6), batch_size, num_directions(=1) * n_hidden(=128)]

         outputs, _ = self.dec_cell(dec_input, enc_states)

         model = self.fc(outputs) # model : [max_len+1(=6), batch_size, n_class]

         return model

 input_batch, output_batch, target_batch = make_batch(seq_data)

 model = Seq2Seq()

 criterion = nn.CrossEntropyLoss()

 optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

 for epoch in range(5000):

     # make hidden shape [num_layers * num_directions, batch_size, n_hidden]

     hidden = Variable(torch.zeros(1, batch_size, n_hidden))

     # input_batch : [batch_size, max_len(=n_step, time step), n_class]

     # output_batch : [batch_size, max_len+1(=n_step, time step) (becase of 'S' or 'E'), n_class]

     # target_batch : [batch_size, max_len+1(=n_step, time step)], not one-hot

     output = model(input_batch, hidden, output_batch)

     # output : [max_len+1, batch_size, n_class]

     output = output.transpose(0, 1) # [batch_size, max_len+1(=6), n_class]

     loss = 0

     for i in range(0, len(target_batch)):

         # output[i] : [max_len+1, n_class, target_batch[i] : max_len+1]

         loss += criterion(output[i], target_batch[i])

     if (epoch + 1) % 1000 == 0:

         print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.6f}'.format(loss))

     optimizer.zero_grad()

     loss.backward()

     optimizer.step()

 # Test

 def translate(word):

     input_batch, output_batch, _ = make_batch([[word, 'P' * len(word)]])

     # make hidden shape [num_layers * num_directions, batch_size, n_hidden]

     hidden = Variable(torch.zeros(1, 1, n_hidden))

     output = model(input_batch, hidden, output_batch)

     # output : [max_len+1(=6), batch_size(=1), n_class]

     predict = output.data.max(2, keepdim=True)[1] # select n_class dimension

     decoded = [char_arr[i] for i in predict]

     end = decoded.index('E')

     translated = ''.join(decoded[:end])

     return translated.replace('P', '')

 print('test')

 print('man ->', translate('man'))

 print('mans ->', translate('mans'))

 print('king ->', translate('king'))

 print('black ->', translate('black'))

 print('upp ->', translate('upp'))

之后，在seq2seq模型基础上，提出了attention机制。

论文： NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

pytorch-- Attention Mechanism的更多相关文章

（转）注意力机制（Attention Mechanism）在自然语言处理中的应用
注意力机制(Attention Mechanism)在自然语言处理中的应用本文转自:http://www.cnblogs.com/robert-dlut/p/5952032.html 近年来,深度 ...
注意力机制（Attention Mechanism）在自然语言处理中的应用
注意力机制(Attention Mechanism)在自然语言处理中的应用近年来,深度学习的研究越来越深入,在各个领域也都获得了不少突破性的进展.基于注意力(attention)机制的神经网络成为了 ...
深度学习之注意力机制（Attention Mechanism）和Seq2Seq
这篇文章整理有关注意力机制(Attention Mechanism )的知识,主要涉及以下几点内容: 1.注意力机制是为了解决什么问题而提出来的? 2.软性注意力机制的数学原理: 3.软性注意力机制. ...
课程五(Sequence Models)，第三周（Sequence models & Attention mechanism） —— 1.Programming assignments：Neural Machine Translation with Attention
Neural Machine Translation Welcome to your first programming assignment for this week! You will buil ...
[C5W3] Sequence Models - Sequence models & Attention mechanism
第三周序列模型和注意力机制(Sequence models & Attention mechanism) 基础模型(Basic Models) 在这一周,你将会学习 seq2seq(sequ ...
模型汇总24 - 深度学习中Attention Mechanism详细介绍：原理、分类及应用
模型汇总24 - 深度学习中Attention Mechanism详细介绍:原理.分类及应用 lqfarmer 深度学习研究员.欢迎扫描头像二维码,获取更多精彩内容. 946 人赞同了该文章 Atte ...
【转载】Attention Mechanism in Deep Learning
本篇随笔为转载,原文地址:知乎,深度学习中Attention Mechanism详细介绍:原理.分类及应用.参考链接:深度学习中的注意力机制. Attention是一种用于提升基于RNN(LSTM或G ...
吴恩达《深度学习》-第五门课序列模型(Sequence Models)-第三周序列模型和注意力机制（Sequence models & Attention mechanism）-课程笔记
第三周序列模型和注意力机制(Sequence models & Attention mechanism) 3.1 序列结构的各种序列(Various sequence to sequence ...
论文解读（GSAT）《Interpretable and Generalizable Graph Learning via Stochastic Attention Mechanism》
论文信息论文标题:Interpretable and Generalizable Graph Learning via Stochastic Attention Mechanism论文作者:Siqi ...
Attention Mechanism in Computer Vision
前言本文系统全面地介绍了Attention机制的不同类别,介绍了每个类别的原理.优缺点. 欢迎关注公众号CV技术指南,专注于计算机视觉的技术总结.最新技术跟踪.经典论文解读.CV招聘信息. 概 ...

随机推荐

微信小程序----日期时间选择器（自定义精确到分秒或时段）
声明 bug:由于此篇博客是在bindcolumnchange事件中做的值的改变处理,因此会出现当你选择时,没有点击确定,直接取消返回后,会发现选择框的值依然改变.造成原因:这一点就是由于在bindc ...
一个简易的 LED 数字时钟实现方法
这个应该是已经有很多人做过的东西,我应该只是算手痒,想写一下,所以,花了点时间折腾了这个,顺便把 Dark Mode 的处理也加上了. 首先可以很明确的一点,这个真没技术含量存在,只是需要点耐心. L ...
SliverAppBar 介绍及使用
SliverAppBar控件可以实现页面头部区域展开.折叠的效果,类似于Android中的CollapsingToolbarLayout.先看下SliverAppBar实现的效果,效果图如下: Sli ...
【Tool】---SVN的超级简单并具体得使用介绍
又一次被打脸,笔者表示再也不相信自己的记性了.简单的SVN隔了一段时间后,由于项目的需要要重新简历代码库,竟然一下子又忘了.天那,这就好比战士上了战场发现没带枪,这能行吗?因此,趁着今天又简短的复习了 ...
TensorFlow or PyTorch
既然你已经读到了这篇文章,我就断定你已经开始了你的深度学习之旅了,并且对人造神经网络的研究已经有一段时间了:或者也许你正打算开始你的学习之旅.无论是哪一种情况,你都是因为发现你陷入了困惑中,才找到了这 ...
从0开发3D引擎（七）：学习Reason语言
目录上一篇博文介绍Reason Reason的优势如何学习Reason? 介绍Reason的部分知识点大家好,本文介绍Reason语言以及学习Reason的方法. 上一篇博文从0开发3D引擎 ...
intellij idea svn 切换分支
原文地址:https://blog.csdn.net/wangjun5159/article/details/75137964 切换分支更新/切换svn的快捷键是ctrl+T,这个快捷键还是很好用的 ...
Thematic002.字符串专题
目录 Trie字典树 KMP AC自动机 Manacher 回文自动机后缀数组后缀自动机 Trie字典树概念我们先来看看什么是Trie字典树可以发现,这棵树的每一条边都有一个字符有一些点是 ...
Dynamics CRM CE 怎样从 UCI 改为 classic UI
dynamics 现在大力推UCI. 但是对于大部分人来说还是使用不习惯. 怎样从UCI改为classic UI呢 1. 快速的方法 https://xxx.crm.dynamics.com/main ...
OpenCV里的颜色空间
RGB三原色组合方式是最常用的 RGB色彩空间: R:红色分量 G:绿色分量 B:蓝色分量 HSV色彩空间: H - 色调(主波长). S - 饱和度(纯度/色调). V - 明度(强度). LAB色 ...

pytorch-- Attention Mechanism

pytorch-- Attention Mechanism的更多相关文章

随机推荐

热门专题