Your Prediction Gets As Good As Your Data

May 5, 2015 by Kazem

In the past, we have seen software engineers and data scientists assume that they can keep increasing their prediction accuracy by improving their machine learning algorithm. Here, we want to approach the classification problem from a different angle where we recommend data scientists should analyze the distribution of their data first to measure information level in data. This approach can givesus an upper bound for how far one can improve the accuracy of a predictive algorithm and make sure our optimization efforts are not wasted!

Entropy and Information

In information theory, mathematician have developed a few useful techniques such as entropy to measure information level in data in process. Let's think of a random coin with a head probability of 1%.

If one filps such a coin, we will get more information when we see the head event since it's a rare event compared to tail which is more likely to happen. We can formualte the amount of information in a random variable with the negative logarithm of the event probability. This captures the described intuition. Mathmatician also formulated another measure called entropy by which they capture the average information in a random process in bits. Below we have shown the entropy formula for a discrete random variable:



For the first example, let's assume we have a coin with P(H)=0% and P(T)=100%. We can compute the entropy of the coin as follows:



For the second example, let's consider a coin where P(H)=1% and P(T)=1-P(H)=99%. Plugging numbers one can find that the entropy of such a coin is:



Finally, if the coin has P(H) = P(T) = 0.5 (i.e. a fair coin), its entropy is calculated as follows:



Entropy and Predictability

So, what these examples tell us? If we have a coin with head probability of zero, the coin's entropy is zero meaning that the average information in the coin is zero. This makes sense because flipping such a coin always comes as tail. Thus, the prediction accuracy is 100%. In other words, when the entropy is zero, we have the maximum predictibility.

In the second example, head probability is not zero but still very close to zero which again makes the coin to be very predictable with a low entropy.

Finally, in the last example we have 50/50 chance of seeing head/tail events which maximizes the entropy and consequently minimizes the predictability. In words, one can show that a fair coin has the meaximum entropy of 1 bit making the prediction as good as a random guess.

Kullback–Leibler divergence

As last example, it's important to give another example of how we can borrow ideas from information theory to measure the distance between two probability distributions. Let's assume we are modeling two random processes by their pmf's: P(.) and Q(.). One can use entropy measure to compute the distance between two pmf's as follows:



Above distance function is known as KL divergence which measures the distance of Q's pmf from P's pmf. The KL divergence can come handy in various problems such as NLP problems where we'd like to measure the distance between two sets of data (e.g. bag of words).

Wrap-up

In this post, we showed that the entropy from information theory provides a way to measure how much information exists in our data. We also highlighted the inverse relationship between the entropy and the predictability. This shows that we can use the entropy measure to calculate an upper bound for the accuracy of the prediction problem in hand.

Feel free to share with us if you have any comments or questions in the comment section below.

You can also reach us at info@AIOptify.com

Your Prediction Gets As Good As Your Data的更多相关文章

  1. Lessons Learned from Developing a Data Product

    Lessons Learned from Developing a Data Product For an assignment I was asked to develop a visual ‘da ...

  2. A Brief Review of Supervised Learning

    There are a number of algorithms that are typically used for system identification, adaptive control ...

  3. 微软职位内部推荐-Software Engineer II

    微软近期Open的职位: Job Description Group: Search Technology Center Asia (STCA)/Search Ads Title: SDEII-Sen ...

  4. 在opencv3中实现机器学习算法之:利用最近邻算法(knn)实现手写数字分类

    手写数字digits分类,这可是深度学习算法的入门练习.而且还有专门的手写数字MINIST库.opencv提供了一张手写数字图片给我们,先来看看 这是一张密密麻麻的手写数字图:图片大小为1000*20 ...

  5. 在opencv3中的机器学习算法练习:对OCR进行分类

    OCR (Optical Character Recognition,光学字符识别),我们这个练习就是对OCR英文字母进行识别.得到一张OCR图片后,提取出字符相关的ROI图像,并且大小归一化,整个图 ...

  6. Libsvm:脚本(subset.py、grid.py、checkdata.py) | MATLAB/OCTAVE interface | Python interface

    1.脚本 This directory includes some useful codes: 1. subset selection tools. (子集抽取工具) subset.py 2. par ...

  7. Nagios工作原理

    图解Nagios的工作原理 Nagios的主动模式和被动模式 被动模式:就如同上图所显示的那样,客户端起nrpe进程,服务端通过check_nrpe插件向客户端发送命令,客户端根据服务端的指示来调用相 ...

  8. 学习笔记TF020:序列标注、手写小写字母OCR数据集、双向RNN

    序列标注(sequence labelling),输入序列每一帧预测一个类别.OCR(Optical Character Recognition 光学字符识别). MIT口语系统研究组Rob Kass ...

  9. OpenCV OpenGL手写字符识别

    另外一篇文章地址:这个比较详细,但是程序略显简单,现在这个程序是比较复杂的 http://blog.csdn.net/wangyaninglm/article/details/17091901 整个项 ...

随机推荐

  1. centos7 lldb 调试netcore应用的内存泄漏和死循环示例(dump文件调试)

    写个demo来玩一玩linux平台下使用lldb加载sos来调试netcore应用. 当然,在真实的产线环境中需要分析的数据和难度远远高于demo所示,所以demo的作用也仅仅只能起到介绍工具的作用. ...

  2. 杂谈---这些大忌,你在面试的时候发生过吗?(NO.1)

              面试是大部分人的人生当中难免会遇到的一件事,那么具体在面试当中有哪些忌讳呢? 说到面试,在这里尤其特指技术岗位的面试,很多时候,结果并不仅仅取决于你的技术广度与深度,亦或是你的笔试 ...

  3. Git提交空目录

    1.git仅跟踪文件的变动,不跟踪目录.如果需要提交空目录,可以在里面添加 .gitignore 文件,方法如下: find . -type d -empty -exec touch {}/.giti ...

  4. 矩阵分解----Cholesky分解

    矩阵分解是将矩阵拆解成多个矩阵的乘积,常见的分解方法有 三角分解法.QR分解法.奇异值分解法.三角分解法是将原方阵分解成一个上三角矩阵和一个下三角矩阵,这种分解方法叫做LU分解法.进一步,如果待分解的 ...

  5. PAT甲题题解-1109. Group Photo (25)-(模拟拍照排队)

    题意:n个人,要拍成k行排队,每行 n/k人,多余的都在最后一排. 从第一排到最后一排个子是逐渐增高的,即后一排最低的个子要>=前一排的所有人 每排排列规则如下: 1.中间m/2+1为该排最高: ...

  6. PAT甲题题解-1112. Stucked Keyboard (20)-(map应用)

    题意:给定一个k,键盘里有些键盘卡住了,按一次会打出k次,要求找出可能的坏键,按发现的顺序输出,并且输出正确的字符串顺序. map<char,int>用来标记一个键是否为坏键,一开始的时候 ...

  7. 20135234mqy-——信息安全系统设计基础第十二周学习总结

    process environ.c environvar.c consumer.c 管道写端 producer.c 管道读端 testmf.c listargs.c pipedemo.c 管道 pip ...

  8. 关于cocos2dx 关键字的问题

    今天码代码,在创建新场景的时候,.h文件里  class Game : public cocos2d::Layer没有问题,在Game类里面,声明了它的成员之后,开始在.cpp文件里面实现这个类,到重 ...

  9. NetFPGA Demo ——reference_nic_nf1_cml

    NetFPGA Demo --reference_nic_nf1_cml 实验平台 OS:deepin 15.4 开发板:NetFPGA_1G_CML 实验过程 从NetFPGA-1G-CML从零开始 ...

  10. LDAP的前世今生

    上世界80年代,就有了LDAP的雏形. 我接触到最早的Windows系列的服务器,是Windows2000 Professional版本里可以加入ActiveDirectory,后来从Windows2 ...