Keyword: Reject Inference
Suppose there is a dataset of several attributes, including working conditions, credit history, and property, that have been provided by the bank. The sample classified the customers according to whether they paid off their loans on time. Those who paid off their loans on time were classified as “good customers”, and those who did not pay off their loans on time were classified as “bad customers”.
If Rick, an employee of the bank, uses this dataset to do data analysis directly, what will happen?
Take one of these attributes as an example.

1 : unemployed
2 : skilled employee
3 : management/ highly qualified employee/ officer

Which of these three groups of people, by instinct, should have the best credit? Most people would think it is the second or the third category. However, the data give us a different answer.

As the data shows, the first group of customers is “better than” the third group of customers. After looking at the data, Rick might reach the conclusion that lending more money to the unemployed people is better than lending money to those who are the highly qualified employee, officer, or management board. Is it correct? Let’s think about it a little bit.
Let’s review the process of collecting data:

  1. Rick’s Customer applies for a personal loan
  2. If it is approved, go to step 3. Otherwise, it will not be counted as a data point in Rick’s data set.
  3. If a customer pays off the loan on time, he will be labeled as a “Good Customer”. Otherwise, he will be labeled as a “Bad Customer”.

Before collecting data, there is a crucial step - Step 2. That is to say, the customers who are collected by Rick have already been selected by the bank. Those who applied for a personal loan but didn’t get approved are not in this dataset.
Here I would like to ask you a question: which has the greater risk, jumping from the 4th floor or the 70th floor? (Please do not try it, it is just an example.) You may reply immediately: “The 70th floor, of course!”

You are wrong. I am not asking about the probability of death. I am asking about risk. Suppose someone will offer you 10 billion if you can jump from 70th floor without dying, then you probably won’t bet with him. However, suppose someone will offer you 10 billion if you can jump from 4th floor without dying, then you might want to give it a shot because you know you may not die.
The customers who make the bank feel like jumping from the 70th floor, are most likely rejected by the bank from the beginning. The bank usually has a hard time to make decisions on the application of the customers who make the bank feel like jumping from the 4th floor.


“The 70th floor” customers are likely existing in the first group of customers. So if the bank approved their application, then there must be some reasons support the bank to believe they will pay off their loans. If the bank approved every first-group customer’s application, the data may be different from current data.
Using the data analysis before didn't really understand the meaning of the data may result in you are deceived by your data.
There are lots of factors should be taken into consideration in an evaluation, but I have to simplify the explanation here. If there are any mistakes or anything make you uncomfortable, please let me know so that I can fix it.

Reject Inference: Your Data is Deceiving You的更多相关文章

  1. Data Visualization – Banking Case Study Example (Part 1-6)

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  2. es6中promise ALL Race Resolve Reject finish的实现

    function mypromise(func){ this.statue = "pending"; this.data = null; this.resolveCallback ...

  3. 信用评分卡 (part 5 of 7)

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  4. 信用评分卡Credit Scorecards (1-7)

      欢迎关注博主主页,学习python视频资源,还有大量免费python经典文章 python风控评分卡建模和风控常识 https://study.163.com/course/introductio ...

  5. cvpr2015papers

    @http://www-cs-faculty.stanford.edu/people/karpathy/cvpr2015papers/ CVPR 2015 papers (in nicer forma ...

  6. ES6笔记(5)-- Generator生成器函数

    系列文章 -- ES6笔记系列 接触过Ajax请求的会遇到过异步调用的问题,为了保证调用顺序的正确性,一般我们会在回调函数中调用,也有用到一些新的解决方案如Promise相关的技术. 在异步编程中,还 ...

  7. 【深度学习Deep Learning】资料大全

    最近在学深度学习相关的东西,在网上搜集到了一些不错的资料,现在汇总一下: Free Online Books  by Yoshua Bengio, Ian Goodfellow and Aaron C ...

  8. ES6扫盲

    原文阅读请点击此处 一.let和const { // let声明的变量只在let命令所在的代码块内有效 let a = 1; var b = 2; } console.log(a); // 报错: R ...

  9. 【腾讯Bugly干货分享】打造“微信小程序”组件化开发框架

    本文来自于腾讯Bugly公众号(weixinBugly),未经作者同意,请勿转载,原文地址:http://mp.weixin.qq.com/s/2nQzsuqq7Avgs8wsRizUhw 作者:Gc ...

随机推荐

  1. unittest单元测试框架之测试环境的初始化与还原(fixture)(五)

    1.方法一:针对每条测试用例进行初始化与还原 import unittest from UnittestDemo.mathfunc import * class TestMathFunc(unitte ...

  2. WPF几个样式

    其实也是大家学的最多的,网上的. 1.老版360 2.360悬浮窗 不好意思,没有找到悬浮球的图片,随便一个代替了 3.老版迅雷 4.新版360 遗憾的是这个样式没有完整的源代码.只是一个演示和图片代 ...

  3. linux 网络服务之一

  4. Java : logback简单配置

    需要把logback.xml文件放在类路径下,如果是spring boot项目可以用 logging.config=classpath:log/xxxxxx.xml来指定配置文件 logback la ...

  5. SSH Secure :Algorithm negotiation failed,反复提示输入password对话框

    在嵌入式开发中,SSH Secure File Transfer Client 软件使用,方便了windows和linux之间文件拷贝,尤其是多台主机状况下. 最近装了Ubuntu 16.0.4,在V ...

  6. Android 串口蓝牙通信开发Java版本

    Android串口BLE蓝牙通信Java版 0. 导语 Qt on Android 蓝牙通信开发 我们都知道,在物联网中,BLE蓝牙是通信设备的关键设备.在传统的物联网应用中,无线WIFI.蓝牙和Zi ...

  7. A1092

    可输入内容为0-9,a-z,A-Z. 输入: 第一行输入任意字符串: 第二行输入期望字符串. 输出: 如果第一行包含了所有期望字符串,输出yes和多余字符个数: 如果第一行不能完全包含期望字符串,输出 ...

  8. Lingo解决最优化问题

    目录 Lingo解决优化问题 前言 一.优化模型介绍 二.运输问题 2.1 问题描述 2.2 问题分析 2.2 优化模型构建 2.3 模型求解 2.4 求解结果 三.待更新 Lingo解决优化问题 @ ...

  9. 讯飞SDK的使用

    在配置好Android Studio 2.3.3后,依照结合网上例子,动手创建讯飞语言听写app,最终手机上运行成功. 主要参考两篇博文(zqHero/XunFeiVoiceDEmo ,Android ...

  10. NetWork——关于TCP协议的三次握手和四次挥手

    分钟. (2)服务器B存在一个保活状态,即如果A突然故障死机了,那B那边的连接资源什么时候能释放呢? 就是保活时间到了后,B会发送探测信息,以决定是否释放连接. (3)为什么连接的时候是三次握手,关闭 ...