On Using Very Large Target Vocabulary for Neural Machine Translation Candidate Sampling Sampled Softmax

【softmax分类器的加速器】

https://www.tensorflow.org/api_docs/python/tf/nn/sampled_softmax_loss

This is a faster way to train a softmax classifier over a huge number of classes.

【分类的结果集过大，选取子集】

https://www.tensorflow.org/api_guides/python/nn#Candidate_Sampling

Do you want to train a multiclass or multilabel model with thousands or millions of output classes (for example, a language model with a large vocabulary)? Training with a full Softmax is slow in this case, since all of the classes are evaluated for every training example. Candidate Sampling training algorithms can speed up your step times by only considering a small randomly-chosen subset of contrastive classes (called candidates) for each batch of training examples.

https://www.tensorflow.org/extras/candidate_sampling.pdf

【 compute F(x, y) for every class y ∈ L for every training example----耗时点，这是要解决的问题】

What is Candidate Sampling Say we have a multiclass or multilabel problem where each training example (x , ) consists of i Ti a context xi a small (multi)set of target classes Ti out of a large universe L of possible classes. For example, the problem might be to predicting the next word (or the set of future words) in a sentence given the previous words.

We wish to learn a compatibility function F(x, y) which says something about the compatibility of a class y with a context x . For example the probability of the class given the context.

“Exhaustive” training methods such as softmax and logistic regression require us to compute F(x, y) for every class y ∈ L for every training example. When |L| is very large, this can be prohibitively expensive.

【the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary：子集】

https://arxiv.org/pdf/1412.2007.pdf

Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method that allows us to use a very large target vocabulary without increasing training complexity, based on importance sampling. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary as well as the LSTM-based neural machine translation models. Furthermore, when we use the ensemble of a few models with very large target vocabularies, we achieve the state-of-the-art translation performance (measured by BLEU) on the English->German translation and almost as high performance as state-of-the-art English->French translation system.

On Using Very Large Target Vocabulary for Neural Machine Translation Candidate Sampling Sampled Softmax的更多相关文章

课程五(Sequence Models)，第三周（Sequence models & Attention mechanism） —— 1.Programming assignments：Neural Machine Translation with Attention
Neural Machine Translation Welcome to your first programming assignment for this week! You will buil ...
Sequence Models Week 3 Neural Machine Translation
Neural Machine Translation Welcome to your first programming assignment for this week! You will buil ...
神经机器翻译 - NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE
论文:NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE 综述背景及问题背景: 翻译: 翻译模型学习条件分布 ...
对Neural Machine Translation by Jointly Learning to Align and Translate论文的详解
读论文 Neural Machine Translation by Jointly Learning to Align and Translate 这个论文是在NLP中第一个使用attention机制 ...
Effective Approaches to Attention-based Neural Machine Translation(Global和Local attention)
这篇论文主要是提出了Global attention 和 Local attention 这个论文有一个译文,不过我没细看 Effective Approaches to Attention-base ...
【转载 | 翻译】Visualizing A Neural Machine Translation Model（神经机器翻译模型NMT的可视化）
转载并翻译Jay Alammar的一篇博文:Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models Wi ...
[笔记] encoder-decoder NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE
原文地址 :[1409.0473] Neural Machine Translation by Jointly Learning to Align and Translate (arxiv.org) ...
Introduction to Neural Machine Translation - part 1
The Noise Channel Model $p(e)$: the language Model $p(f|e)$: the translation model where, $e$: ...
论文阅读 | Robust Neural Machine Translation with Doubly Adversarial Inputs
(1)用对抗性的源实例攻击翻译模型; (2)使用对抗性目标输入来保护翻译模型,提高其对对抗性源输入的鲁棒性. 生成对抗输入:基于梯度 (平均损失) -> AdvGen 我们的工作处理由白盒N ...

随机推荐

Codeforces 632F Magic Matrix（bitset）
题目链接 Magic Matrix 考虑第三个条件,如果不符合的话说明$a[i][k] < a[i][j]$ 或 $a[j][k] < a[i][j]$ 于是我们把所有的$(a[i][j ...
洛谷—— P1609 最小回文数
https://www.luogu.org/problemnew/show/1609 题目描述回文数是从左向右读和从右向左读结果一样的数字串. 例如:121.44 和3是回文数,175和36不是. ...
2016北京集训测试赛（十六）Problem B: river
Solution 这题实际上并不是构造题, 而是一道网络流. 我们考虑题目要求的一条路径应该是什么样子的: 它是一个环, 并且满足每个点有且仅有一条出边, 一条入边, 同时这两条边的权值还必须不一样. ...
MyEclipse导入外部项目
1,File 2,Preferences 3,General----Existing----next 4,Browse选择要导入的项目---finash 5,导入后可能会出现很多error 检查项目的 ...
成长笔记--解决Eclipse 变量名的自动补全问题
大家使用eclipse敲代码的时候,是不是都被这样一个问题困扰着.就是键入一个变量名的时候,会自动提示补全:在你的变量名后面加上类型的名字!这个时候,你就必须键入Esc才不会自动补全你的变量,如果你键 ...
require_once(): Failed opening required '/var/www/config/config.php' (include_path='.:') in /var/www/vendor/forkiss/pharest/src/Pharest/Register/Register.php on line 10
环境 docker环境错误 [Tue Jun 18 18:43:26 2019] 127.0.0.1:53980 [500]: /index.php - require_once(): Failed ...
sublimetext3打造pythonIDE
虽然pycharm是非常好用的pythonIDE,用来开发项目很方便,但是修改调整单个或几个小程序就显得很笨重,这时候我们可以选择使用sublime. 一般来说要开发项目我都用pycharm,开发简单 ...
Hessian原理与程序设计
Hessian是比較经常使用的binary-rpc.性能较高,适合互联网应用.主要使用在普通的webservice 方法调用.交互数据较小的场景中.hessian的数据交互基于http协议,通常he ...
apue学习笔记（第十五章进程间通信）
本章将说明进程之间相互通信的其它技术----进程间通信(IPC) 管道管道只能在具有公共祖先的两个进程之间只用.通常,一个管道由一个进程创建,在进程调用fork后,这个管道就能在父进程和子进程之间使 ...
eclipse svn插件删除原账号信息重新登录
1.通过删除SVN客户端的账号配置文件 1)查看你的Eclipse中使用的是什么SVN Interface(中文:svn接口)windows > preference > Team ...

On Using Very Large Target Vocabulary for Neural Machine Translation Candidate Sampling Sampled Softmax

On Using Very Large Target Vocabulary for Neural Machine Translation Candidate Sampling Sampled Softmax的更多相关文章

随机推荐

热门专题