Perplexity Vs Cross-entropy

姜楠 2024-10-14 06:47:22 原文

Evaluating a Language Model: Perplexity

We have a serial of \(m\) sentences:
\[s_1,s_2,\cdots,s_m\]
We could look at the probability under our model \(\prod_{i=1}^m{p(s_i)}\). Or more conveniently, the log probability:
\[\log \prod_{i=1}^m{p(s_i)}=\sum_{i=1}^m{\log p(s_i)}\]
where \(p(s_i)\) is the probability of sentence \(s_i\).

In fact, the usual evaluation measure is perplexity:
\[PPL=2^{-l}\]
\[l=\frac{1}{M}\sum_{i=1}^m{\log p(s_i)}\]
and \(M\) is the total number of words in the test data.

Cross-Entropy

Given words \(x_1,\cdots,x_t\), a language model prdicts the following word \(x_{t+1}\) by modeling:
\[P(x_{t+1}=v_j|x_t\cdots,x_1)=\hat y_j^t\]
where \(v_j\) is a word in the vocabulary.

The predicted output vector \(\hat y^t\in \mathbb{R}^{|V|}\) is a probability distribution over the vocabulary, and we optimize the cross-entrpy loss:
\[\mathcal{L}^t(\theta)=CE(y^t,\hat y^t)=-\sum_{i=1}^{|V|}{y_i^t\log \hat y_i^t}\]
where \(y^t\) is the one-hot vector corresponding to the target word. This is a poiny-wise loss, and we sum the cross-ntropy loss across all examples in a sequence, across all sequences in the dataset in order to evaluate model performance.

The relationship between cross-entropy and ppl

\[PP^t=\frac{1}{P(x_{t+1}^{pred}=x_{t+1}|x_t\cdots,x_1)}=\frac{1}{\sum_{j=1}^V {y_j^t\cdot \hat y_j^t}}\]
which is the inverse probability of the correct word, according to the model distribution \(P\).

suppose \(y_i^t\) is the only nonzero element of \(y^t\). Then, note that:
\[CE(y^t,\hat y^t)=-\log \hat y_i^t=\log\frac{1}{\hat y_i^t}\]
\[PP(y^t,\hat y^t)=\frac{1}{\hat y_i^t}\]
Then, it follows that:
\[CE(y^t,\hat y^t)=\log PP(y^t,\hat y^t)\]

In fact, minizing the arthimic mean of the cross-entropy is identical to minimizing the geometric mean of the perplexity. If the model predictions are completely random, \(E[\hat y_i^t]=\frac{1}{|V|}\), and the expected cross-entropies are \(\log |V|\), (\(\log 10000\approx 9.21\))

Perplexity Vs Cross-entropy的更多相关文章

最大似然估计 (Maximum Likelihood Estimation), 交叉熵 (Cross Entropy) 与深度神经网络
最近在看深度学习的"花书" (也就是Ian Goodfellow那本了),第五章机器学习基础部分的解释很精华,对比PRML少了很多复杂的推理,比较适合闲暇的时候翻开看看.今天准备写 ...
卷积神经网络系列之softmax，softmax loss和cross entropy的讲解
我们知道卷积神经网络(CNN)在图像领域的应用已经非常广泛了,一般一个CNN网络主要包含卷积层,池化层(pooling),全连接层,损失层等.虽然现在已经开源了很多深度学习框架(比如MxNet,Caf ...
关于交叉熵（cross entropy），你了解哪些
二分~多分~Softmax~理预一.简介在二分类问题中,你可以根据神经网络节点的输出,通过一个激活函数如Sigmoid,将其转换为属于某一类的概率,为了给出具体的分类结果,你可以取0.5作为阈值, ...
softmax，softmax loss和cross entropy的区别
版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/u014380165/article/details/77284921 我们知道卷积神经网络(CNN ...
【转】TensorFlow四种Cross Entropy算法实现和应用
http://www.jianshu.com/p/75f7e60dae95 作者:陈迪豪来源:CSDNhttp://dataunion.org/26447.html 交叉熵介绍交叉熵(Cross ...
softmax，softmax loss和cross entropy的讲解
1 softmax 我们知道卷积神经网络(CNN)在图像领域的应用已经非常广泛了,一般一个CNN网络主要包含卷积层,池化层(pooling),全连接层,损失层等.这一篇主要介绍全连接层和损失层的内容, ...
一篇博客：分类模型的 Loss 为什么使用 cross entropy 而不是 classification error 或 squared error
https://zhuanlan.zhihu.com/p/26268559 分类问题的目标变量是离散的,而回归是连续的数值. 分类问题,都用 onehot + cross entropy traini ...
cross entropy与logistic regression
维基上corss entropy的一部分知乎上也有一个类似问题:https://www.zhihu.com/question/36307214 cross entropy有二分类和多分类的形式,分别 ...
交叉熵cross entropy和相对熵（kl散度）
交叉熵可在神经网络(机器学习)中作为损失函数,p表示真实标记的分布,q则为训练后的模型的预测标记分布,交叉熵损失函数可以衡量真实分布p与当前训练得到的概率分布q有多么大的差异. 相对熵(relativ ...
TensorFlow 实战（一）—— 交叉熵（cross entropy）的定义
对多分类问题(multi-class),通常使用 cross-entropy 作为 loss function.cross entropy 最早是信息论(information theory)中的概念 ...

随机推荐

HDFS --访问
Hdfs的访问方式有两种,第一:类似linux命令,hadoop shell.第二:java API方式. 先看第一种. FS Shell cat chgrp chmod chown copyFrom ...
Ignite 配置更新Oracle JDBC Drive
如果使用Oracle 12C 作为Ignite 的Repository的话,在Repository Createion Wizard的配置过程中,会出现ORA-28040:No matc ...
hibernate 注解唯一键约束 uniqueConstraints
@Table 注解包含一个schema和一个catelog 属性,使用@UniqueConstraints 可以定义表的唯一约束. 如果是联合约束就用下面这种 @Table(name="tb ...
Postgresql扩展及UUID
切换数据库 \connect $DBNAME 查看Postgresql的可用扩展 SELECT * FROM pg_available_extensions; 安装所需扩展 CREATE EXTENS ...
Codeforces Round #378 (Div. 2)
A: 思路: 水题,没啥意思; B: 思路: 暴力,也没啥意思; C: 思路: 思维,可以发现从前往后和为b[i]的分成一块,然后这一块里面如果都相同就没法开始吃,然后再暴力找到那个最大的且能一开始就 ...
Mybatis(综合案例)
MyBatis本是apache的一个开源项目iBatis,2010年这个项目有Apache software foundation 迁移到了Google code,并改名MyBatis.2013年11 ...
POJ1336 The K-League[最大流公平分配问题]
The K-League Time Limit: 1000MS Memory Limit: 10000K Total Submissions: 715 Accepted: 251 Descri ...
洛谷P1288 取数游戏II[博弈论]
题目描述有一个取数的游戏.初始时,给出一个环,环上的每条边上都有一个非负整数.这些整数中至少有一个0.然后,将一枚硬币放在环上的一个节点上.两个玩家就是以这个放硬币的节点为起点开始这个游戏,两人轮流 ...
python 类属性与方法
Python 类属性与方法标签(空格分隔): Python Python的访问限制 Python支持面向对象,其对属性的权限控制通过属性名来实现,如果一个属性有双下划线开头(__),该属性就无法被外 ...
spring mvc+ spring +mybatis
首先,修改web.xml,添加配置文件路由以及格式过滤 <?xml version="1.0" encoding="UTF-8"?> <web ...