Link of the Paper: https://arxiv.org/abs/1705.03122

Motivation:

  • Compared to recurrent layers, convolutions create representations for fixed size contexts, however, the effective context size of the network can easily be made larger by stacking several layers on top of each other. This allows to precisely control the maximum length of dependencies to be modeled. Convolutional networks do not depend on the computations of the previous time step and therefore allow parallelization over every element in a sequence. This contrasts with RNNs which maintain a hidden state of the entire past that prevents parallel computation within a sequence.
  • Multi-layer convolutional neural networks create hierarchical representations over the input sequence in which nearby input elements interact at lower layers while distant elements interact at higher layers. Hierarchical structure provides a shorter path to capture long-range dependencies compared to the chain structure modeled by recurrent networks. Inputs to a convolutional network are fed through a constant number of kernels and non-linearities, whereas recurrent networks apply up to n operations and non-linearities to the first word and only a single set of operations to the last word. Fixing the number of nonlinearities applied to the inputs also eases learning.

Innotation:

  • An architecture for Seq2Seq modeling based entirely on convolutional neural networks. Both encoder and decoder networks share a simple block structure that computes intermediate states based on a fixed number of input elements. Each block contains a one dimensional convolution followed by a non-linearity. For a decoder network with a single block and kernel width k, each resulting state hi1 contains information over k input elements. Stacking several blocks on top of each other increases the number of input elements represented in a state. ( Stacking is similar to the pooling process. )
  • Position Embeddings: Input elements x = (x1, . . . , xm) embedded in distributional space as w = (w1, . . . , wm), where wj ∈ Rf is a column in an embedding matrix D ∈ RV×f. The authors also equip the model with a sense of order by embedding the absolute position of input elements p = (p1, . . . , pm) where pj ∈ Rf. Both are combined to obtain input element representations e = (w1+p1, . . . , wm+pm). Position embeddings are useful in the architecture since they give the model a sense of which portion of the sequence in the input or output it is currently dealing with.
  • The authors introduce a separate attention mechanism for each decoder layer.

Improvement:

  • The model is equipped with gated linear unitsLanguage modeling with gated linear units - Dauphin et al., arXiv 2016 ) and residual connectionsDeep Residual Learning for Image Recognition - He et al., CVPR 2015a ).

    • The authors choose gated linear units as non-linearity which implement a simple gating mechanism over the output of the convolution Y = [A B] ∈ R2d: v([ B]) = ⓧ σ(B), where A, BRd are the inputs to the non-linearity, ⓧ is the point-wise multiplication and the output v([B]) ∈ Rd is half the size of Y. The gates σ(B) control which inputs A of the current context are relevant. And GLUs perform better than tanh in the context of language modelling.
    • To enable deep convolutional networks, authors add residual connections from the input of each convolution to the output of the block. hi= v( W[hi−k/2l−1, . . . , hi+k/2l−1] + bw) + hil−1 
  • For encoder networks authors ensure that the output of the convolutional layers matches the input length by padding the input at each layer. However, for decoder networks they have to take care that no future information is available to the decoder. Specifically, we pad the input by k − 1 elements on both the left and right side by zero vectors, and then remove k elements from the end of the convolution output.

General Points:

  • Sequence to sequence modeling has been synonymous with recurrent neural network based encoder-decoder architectures. The encoder RNN processes an input sequence x = (x1, . . . , xm) of m elements and returns state representations z = (z1, . . . , zm). The decoder RNN takes z and generates the output sequence y = (y1, . . . , yn) left to right, one element at a time. To generate output yi+1, the decoder computes a new hidden state hi+1 based on the previous state hi, an embedding gi of the previous target language word yi, as well as a conditional input ci derived from the encoder output z. Models without attention consider only the final encoder state zm by setting ci = zm for all i, or simply initialize the first decoder state with zm, in which case ci is not used. Architectures with attention compute ci as a weighted sum of (z1, . . . , zm) at each time step. The weights of the sum are referred to as attention scores and allow the network to focus on different parts of the input sequence as it generates the output sequences. Attention scores are computed by essentially comparing each encoder state zj to a combination of the previous decoder state hi and the last prediction yi; the result is normalized to be a distribution over input elements.

Paper Reading - Convolutional Sequence to Sequence Learning ( CoRR 2017 ) ★的更多相关文章

  1. Paper Reading - Convolutional Image Captioning ( CVPR 2018 )

    Link of the Paper: https://arxiv.org/abs/1711.09151 Motivation: LSTM units are complex and inherentl ...

  2. [paper reading] C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection CVPR2019

    MIL陷入局部最优,检测到局部,无法完整的检测到物体.将instance划分为空间相关和类别相关的子集.在这些子集中定义一系列平滑的损失近似代替原损失函数,优化这些平滑损失. C-MIL learns ...

  3. Paper Reading - Attention Is All You Need ( NIPS 2017 ) ★

    Link of the Paper: https://arxiv.org/abs/1706.03762 Motivation: The inherently sequential nature of ...

  4. Convolutional Sequence to Sequence Learning 论文笔记

    目录 简介 模型结构 Position Embeddings GLU or GRU Convolutional Block Structure Multi-step Attention Normali ...

  5. PP: Sequence to sequence learning with neural networks

    From google institution; 1. Before this, DNN cannot be used to map sequences to sequences. In this p ...

  6. 【论文阅读】Sequence to Sequence Learning with Neural Network

    Sequence to Sequence Learning with NN <基于神经网络的序列到序列学习>原文google scholar下载. @author: Ilya Sutske ...

  7. Mol Cell Proteomics. | Prediction of LC-MS/MS properties of peptides from sequence by deep learning (通过深度学习技术根据肽段序列预测其LC-MS/MS谱特征) (解读人:梅占龙)

    通过深度学习技术根据肽段序列预测其LC-MS/MS谱特征 解读人:梅占龙  质谱平台 文献名:Prediction of LC-MS/MS properties of peptides from se ...

  8. Paper Read: Convolutional Image Captioning

    Convolutional Image Captioning 2018-11-04 20:42:07 Paper: http://openaccess.thecvf.com/content_cvpr_ ...

  9. A neural chatbot using sequence to sequence model with attentional decoder. This is a fully functional chatbot.

    原项目链接:https://github.com/chiphuyen/stanford-tensorflow-tutorials/tree/master/assignments/chatbot 一个使 ...

随机推荐

  1. 使用JedisCluster出现异常:java.lang.NumberFormatException

    在使用JedisCluster进行测试时出现如下异常: java.lang.NumberFormatException: For input string: "7004@17004" ...

  2. Spring异步-@Async注解

    Spring异步:@Async注解 使用@Async前需要开启异步支持:@EnableAsync 注解和XML方式 @Async返回值的调用:需要使用Future包装 1.如果没有使用Future包装 ...

  3. 决策树 - 可能是CART公式最严谨的介绍

    目录 决策树算法 ID3算法[1] C4.5 改进[1] "纯度"度量指标:信息增益率 离散化处理 CART(分类与回归树,二叉) 度量指标 二值化处理 不完整数据处理 CART生 ...

  4. SQL Server系统表sysobjects

    sysobjects 表  在数据库内创建的每个对象(约束.默认值.日志.规则.存储过程等)在表中占一行.只有在 tempdb 内,每个临时对象才在该表中占一行. sysobjects 表结构: 列名 ...

  5. 每天一个linux命令(1):find命令之exec

    ind是我们很常用的一个Linux命令,但是我们一般查找出来的并不仅仅是看看而已,还会有进一步的操作,这个时候exec的作用就显现出来了. exec解释:-exec 参数后面跟的是command命令, ...

  6. 需求:promise执行买菜做饭过程

    需求:promise执行买菜做饭过程 1.买菜 2.洗菜 3.做饭 4.吃饭 <!DOCTYPE html> <html lang="en"> <he ...

  7. PHP 扩展 trie-tree, swoole过滤敏感词方案

    在一些app,web中评论以及一些文章会看到一些*等,除了特定的不显示外,我们会把用户输入的一些敏感字符做处理,具体显示为*还是其他字符按照业务区实现. 下面简单介绍下业务处理. 原文地址:小时刻个人 ...

  8. Python-逻辑运算

    1 or 3>2 and 4<5 or 6 and 2<7

  9. 通过XShell实现windows文件上传到Linux服务器上

    .XShell上传文件到Linux服务器上 在学习Linux过程中,我们常常需要将本地文件上传到Linux主机上,这里简单记录下使用Xsheel工具进行文件传输 1:首先连接上一台Linux主机 2: ...

  10. IOS 可以连接 蓝牙BLE设备,但是无法发现服务(原创)

    注:转载请标明文章来源,感谢支持作者劳动! 一.问题描述 用iphone手机上的nRF connect软件调试蓝牙通信. 1.nRF52蓝牙demo电路板,烧录一个SDK的程序,iphone手机可以成 ...