转自：https://github.com/andrewt3000/DL4NLP

Deep Learning for NLP resources

State of the art resources for NLP sequence modeling tasks such as machine translation, image captioning, and dialog.

Deep Learning for NLP

Stanford Natural Language Processing
Intro NLP course with videos. This has no deep learning. But it is a good primer for traditional nlp. Covers topics such as sentence segmentation, word tokenizing, word normalization, n-grams, named entity recognition, part of speech tagging.Currently not available

Stanford CS 224D: Deep Learning for NLP class
Richard Socher. (2016) Class with syllabus, and slides.
Videos: 2015 lectures / 2016 lectures

A Primer on Neural Network Models for Natural Language Processing
Yoav Goldberg. October 2015. No new info, 75 page summary of state of the art.

Oxford Deep Learning for NLP class
Phil Blunsom. (2017) Class by Deep Mind NLP Group.
Lecture slides, videos, and practicals: Github Repository
Currently ongoing

Word Vectors

Resources about word vectors, aka word embeddings, and distributed representations for words.
Word vectors are numeric representations of words where similar words have similar vectors. Word vectors are often used as input to deep learning systems. This process is sometimes called pretraining.

A neural probabilistic language model.
Bengio 2003. Seminal paper on word vectors.

Efficient Estimation of Word Representations in Vector Space
Mikolov et al. 2013. Word2Vec generates word vectors in an unsupervised way by attempting to predict words from a corpus. Describes Continuous Bag-of-Words (CBOW) and Continuous Skip-gram models for learning word vectors.
Skip-gram takes center word and predict outside words. Skip-gram is better for large datasets.
CBOW - takes outside words and predict the center word. CBOW is better for smaller datasets.

Distributed Representations of Words and Phrases and their Compositionality
Mikolov et al. 2013. Learns vectors for phrases such as "New York Times." Includes optimizations for skip-gram: heirachical softmax, and negative sampling. Subsampling frequent words. (i.e. frequent words like "the" are skipped periodically to speed things up and improve vector for less frequently used words)

Linguistic Regularities in Continuous Space Word Representations
Mikolov et al. 2013. Performs well on word similarity and analogy task. Expands on famous example: King – Man + Woman = Queen
Word2Vec source code
Word2Vec tutorial in TensorFlow

word2vec Parameter Learning Explained
Rong 2014

Articles explaining word2vec: Deep Learning, NLP, and Representations and The amazing power of word vectors

GloVe: Global vectors for word representation
Pennington, Socher, Manning. 2014. Creates word vectors and relates word2vec to matrix factorizations. Evalutaion section led to controversy by Yoav Goldberg
Glove source code and training data

Enriching Word Vectors with Subword Information
Bojanowski, Grave, Joulin, Mikolov 2016
FastText Code

Sentiment Analysis

Thought vectors are numeric representations for sentences, paragraphs, and documents. This concept is used for many text classification tasks such as sentiment analysis.

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Socher et al. 2013. Introduces Recursive Neural Tensor Network and dataset: "sentiment treebank." Includes demo site. Uses a parse tree.

Distributed Representations of Sentences and Documents
Le, Mikolov. 2014. Introduces Paragraph Vector. Concatenates and averages pretrained, fixed word vectors to create vectors for sentences, paragraphs and documents. Also known as paragraph2vec. Doesn't use a parse tree.
Implemented in gensim. See doc2vec tutorial

Deep Recursive Neural Networks for Compositionality in Language
Irsoy & Cardie. 2014. Uses Deep Recursive Neural Networks. Uses a parse tree.

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
Tai et al. 2015 Introduces Tree LSTM. Uses a parse tree.

Semi-supervised Sequence Learning
Dai, Le 2015
Approach: "We present two approaches that use unlabeled data to improve sequence learning with recurrent networks. The first approach is to predict what comes next in a sequence, which is a conventional language model in natural language processing. The second approach is to use a sequence autoencoder..."
Result: "With pretraining, we are able to train long short term memory recurrent networks up to a few hundred timesteps, thereby achieving strong performance in many text classification tasks, such as IMDB, DBpedia and 20 Newsgroups."

Bag of Tricks for Efficient Text Classification
Joulin, Grave, Bojanowski, Mikolov 2016 Facebook AI Research.
"Our experiments show that our fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation."
FastText blog
FastText Code

Neural Machine Translation

In 2014, neural machine translation (NMT) performance became comprable to state of the art statistical machine translation(SMT).

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (abstract)
Cho et al. 2014 Breakthrough deep learning paper on machine translation. Introduces basic sequence to sequence model which includes two rnns, an encoder for input and a decoder for output.

Neural Machine Translation by jointly learning to align and translate (abstract)
Bahdanau, Cho, Bengio 2014.
Implements attention mechanism. "Each time the proposed model generates a word in a translation, it (soft-)searches for a set of positions in a source sentence where the most relevant information is concentrated"
Result: "comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation."
English to French Demo

On Using Very Large Target Vocabulary for Neural Machine Translation
Jean, Cho, Memisevic, Bengio 2014.
"we try replacing each [UNK] token with the aligned source word or its most likely translation determined by another word alignment model."
Result: English -> German bleu score = 21.59 (target vocabulary of 50,000)

Sequence to Sequence Learning with Neural Networks
Sutskever, Vinyals, Le 2014. (nips presentation). Uses seq2seq to generate translations.
Result: English -> French bleu score = 34.8 (WMT’14 dataset)
A key contribution is improvements from reversing the source sentences.
seq2seq tutorial in TensorFlow.

Addressing the Rare Word Problem in Neural Machine Translation (abstract)
Luong, Sutskever, Le, Vinyals, Zaremba 2014
Replace UNK words with dictionary lookup.
Result: English -> French BLEU score = 37.5.

Effective Approaches to Attention-based Neural Machine Translation
Luong, Pham, Manning. 2015
2 models of attention: global and local.
Result: English -> German 25.9 BLEU points

Context-Dependent Word Representation for Neural Machine Translation
Choi, Cho, Bengio 2016
"we propose to contextualize the word embedding vectors using a nonlinear bag-of-words representation of the source sentence."
"we propose to represent special tokens (such as numbers, proper nouns and acronyms) with typed symbols to facilitate translating those words that are not well-suited to be translated via continuous vectors."

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Wu et al. 2016
blog post
"WMT’14 English-to-French, our single model scores 38.95 BLEU"
"WMT’14 English-to-German, our single model scores 24.17 BLEU"

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
Johnson et al. 2016
blog post
Translations between untrained language pairs.

Google has started rolling out NMT to it's production system, and it's a significant improvement.

Image Captioning

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Xu et al. 2015 Creates captions by feeding image into a CNN which feeds into hidden state of an RNN that generates the caption. At each time step the RNN outputs next word and the next location to pay attention to via a probability over grid locations. Uses 2 types of attention soft and hard. Soft attention uses gradient descent and backprop and is deterministic. Hard attention selects the element with highest probability. Hard attention uses reinforcement learning, rather than backprop and is stochastic.

Open source implementation in TensorFlow

Conversation modeling / Dialog

Neural Responding Machine for Short-Text Conversation
Shang et al. 2015 Uses Neural Responding Machine. Trained on Weibo dataset. Achieves one round conversations with 75% appropriate responses.

A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
Sordoni et al. 2015. Generates responses to tweets.
Uses Recurrent Neural Network Language Model (RLM) architecture of (Mikolov et al., 2010). source code: RNNLM Toolkit

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
Serban, Sordoni, Bengio et al. 2015. Extends hierarchical recurrent encoder-decoder neural network (HRED).

Attention with Intention for a Neural Network Conversation Model
Yao et al. 2015 Architecture is three recurrent networks: an encoder, an intention network and a decoder.

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
Serban, Sordoni, Lowe, Charlin, Pineau, Courville, Bengio 2016
Proposes novel architecture: VHRED. Latent Variable Hierarchical Recurrent Encoder-Decoder
Compares favorably against LSTM and HRED.

A Neural Conversation Model
Vinyals, Le 2015. Uses LSTM RNNs to generate conversational responses. Uses seq2seq framework. Seq2Seq was originally designed for machine translation and it "translates" a single sentence, up to around 79 words, to a single sentence response, and has no memory of previous dialog exchanges. Used in Google Smart Reply feature for Inbox

Incorporating Copying Mechanism in Sequence-to-Sequence Learning
Gu et al. 2016 Proposes CopyNet, builds on seq2seq.

A Persona-Based Neural Conversation Model
Li et al. 2016 Proposes persona-based models for handling the issue of speaker consistency in neural response generation. Builds on seq2seq.

Deep Reinforcement Learning for Dialogue Generation
Li et al. 2016. Uses reinforcement learing to generate diverse responses. Trains 2 agents to chat with each other. Builds on seq2seq.

Deep learning for chatbots
Article summary of state of the art, and challenges for chatbots.
Deep learning for chatbots. part 2
Implements a retrieval based dialog agent using dual encoder lstm with TensorFlow, based on the Ubuntu dataset [paper] includes source code

ParlAI A framework for training and evaluating AI models on a variety of openly available dialog datasets. Released by FaceBook.

Memory and Attention Models

Attention mechanisms allows the network to refer back to the input sequence, instead of forcing it to encode all information into one fixed-length vector. - Attention and Memory in Deep Learning and NLP

Memory Networks Weston et. al 2014, and End-To-End Memory Networks Sukhbaatar et. al 2015.
Memory networks are implemented in MemNN. Attempts to solve task of reason attention and memory.
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
Weston 2015. Classifies QA tasks like single factoid, yes/no etc. Extends memory networks.
Evaluating prerequisite qualities for learning end to end dialog systems
Dodge et. al 2015. Tests Memory Networks on 4 tasks including reddit dialog task.
See Jason Weston lecture on MemNN

Neural Turing Machines
Graves, Wayne, Danihelka 2014.
We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-toend, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples. Olah and Carter blog on NTM

Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
Joulin, Mikolov 2015. Stack RNN source code and blog post

Reasoning, Attention and Memory RAM workshop at NIPS 2015. slides included

自然语言处理资源NLP的更多相关文章

自然语言处理（NLP）相关学习资料/资源
自然语言处理(NLP)相关学习资料/资源 1. 书籍推荐自然语言处理统计自然语言处理(第2版) 作者:宗成庆出版社:清华大学出版社:出版年:2013:页数:570 内容简介:系统地描述了神经网络 ...
聊天机器人（chatbot）终极指南：自然语言处理（NLP）和深度机器学习（Deep Machine Learning）
在过去的几个月中,我一直在收集自然语言处理(NLP)以及如何将NLP和深度学习(Deep Learning)应用到聊天机器人(Chatbots)方面的最好的资料. 时不时地我会发现一个出色的资源,因此 ...
注意力机制（Attention Mechanism）应用——自然语言处理（NLP）
近年来,深度学习的研究越来越深入,在各个领域也都获得了不少突破性的进展.基于注意力(attention)机制的神经网络成为了最近神经网络研究的一个热点,下面是一些基于attention机制的神经网络在 ...
自然语言处理（NLP）书籍资源清单
1. 书籍入门: <Speech and Language Processing>Dan Jurafsky ,James H. Martin 2. blog及项目
利用Tensorflow进行自然语言处理（NLP）系列之一Word2Vec
同步笔者CSDN博客(https://blog.csdn.net/qq_37608890/article/details/81513882). 一.概述本文将要讨论NLP的一个重要话题:Word2V ...
信息检索和自然语言处理 IR&NLP howto
课程: 6.891 (Fall 2003): Machine Learning Approaches for Natural Language Processing http://www.ai.mit ...
自然语言处理（NLP）常用开源工具总结（转）
..................................内容纯转发+收藏................................... 学习自然语言这一段时间以来接触和听说了好多开 ...
初学者如何查阅自然语言处理（NLP）领域学术资料
1. 国际学术组织.学术会议与学术论文自然语言处理(natural language processing,NLP)在很大程度上与计算语言学(computational linguistics,CL ...
自注意力机制（Self-attention Mechanism）——自然语言处理（NLP）
近年来,注意力(Attention)机制被广泛应用到基于深度学习的自然语言处理(NLP)各个任务中.随着注意力机制的深入研究,各式各样的attention被研究者们提出.在2017年6月google机 ...

随机推荐

题目1.A乘以B
看我没骗你吧 —— 这是一道你可以在10秒内完成的题:给定两个绝对值不超过100的整数A和B,输出A乘以B的值. 1实验代码 #include<stdio.h>int main(void) ...
Linux NTP服务器的搭建及client自动更新时间
Network Time Protocol(NTP)是用来使计算机时间同步化的一种协议,它可以使计算机对其服务器或时钟源(如石英钟,GPS等等)做同步化,它可以提供高精准度的时间校正(LAN上与标准间 ...
ffplay播放PCM裸流
ffplay -f s16le -ar 48000 -ac 2 d:\lei.pcm
sklearn—LinearRegression,Ridge,RidgeCV,Lasso线性回归模型简单使用
线性回归 import sklearnfrom sklearn.linear_model import LinearRegression X= [[0, 0], [1, 2], [2, 4]] y = ...
Python 笔试集（4）：True + True == ？
目录目录前文列表面试题True Ture 布尔值布尔类型是特殊的整数类型前文列表 Python 笔试集:什么时候 i = i + 1 并不等于 i += 1? Python 笔试集(1):关 ...
Maven使用WEB-INF/lib下面的jar编译和打包
在某些情况下,maven无法下载依赖的jar,或者依赖的m2会非常的大,上G那是随随便便的事.为了方便修改和编译,在打出的war包基础上,或者直接把tomcat的webapps下的项目拿出来,就可以用 ...
《Using Databases with Python》Week3 Data Models and Relational SQL 课堂笔记
Coursera课程<Using Databases with Python> 密歇根大学 Week3 Data Models and Relational SQL 15.4 Design ...
【WPF异常】在使用 ItemsSource 之前，项集合必须为空
<DataGrid x:Name=" AutoGenerateColumns="False" GridLinesVisibility="None" ...
手工设计神经MNIST使分类精度达到98%以上
设计了两个隐藏层,激活函数是tanh,使用Adam优化算法,学习率随着epoch的增大而调低 import tensorflow as tf from tensorflow.examples.tuto ...
elementUI 分页bug解决
在使用elementui的分页组件时,我发现当对表格数据进行删除时,而且是删除到该页最后一条数据时,当前页面currentPage并不能自动减1,也就是说,当前页currentPage只有你点击页码时 ...

自然语言处理资源NLP