python信用评分卡(附代码,博主录制)

由于模型是以特定时期的样本所开发的,此模型是否适用于开发样本之外的族群,必须经过稳定性测试才能得知。稳定度指标(population stability index ,PSI)可衡量测试样本及模型开发样本评分的的分布差异,为最常见的模型稳定度评估指针。其实PSI表示的就是按分数分档后,针对不同样本,或者不同时间的样本,population分布是否有变化,就是看各个分数区间内人数占总人数的占比是否有显著变化。公式如下:

PSI实际应用范例:

1)样本外测试

  针对不同的样本测试一下模型稳定度,比如训练集与测试集,也能看出模型的训练情况,我理解是看出模型的方差情况。

2)时间外测试

  测试基准日与建模基准日相隔越远,测试样本的风险特征和建模样本的差异可能就越大,因此PSI值通常较高。至此也可以看出模型建的时间太长了,是不是需要重新用新样本建模了。

http://ucanalytics.com/blogs/population-stability-index-psi-banking-case-study/

This is a continuation of the banking case study for the creation of application risk scorecards we have discussed in some previous articles. You could find the previous parts of the series at the following links (Part 1)(Part 2), (Part 3)and (Part 4).

In this article, we will discuss the Population Stability Index (PSI), an important metric to identify a shift in population for retail credit scorecards. Before we delve deeper into the calculation of the population stability index (PSI) and its utility, let’s try to understand the overall purpose of the PSI and similar indexes by connecting a few dots between..

Dictators and Credit Crisis

What is similar between Napoleon’s and then Hitler’s attempts to invade Russia and financial crisis of 2007-08?

Napoleon tried to invade Russia in 1812 and Hitler repeated Napoleon’s misdeeds in 1941 – both invasions ended with severe defeats for the armies of the dictators. The armies of both Napoleon and Hitler were far superior to the Russians. It was the conditions in which the battles were fought that resulted in these defeats. Russian winters are often held responsible for the fate of these armies. In reality, it was the ill-preparedness and bad judgment of both Napoleon’s and Hitler’s men that caused them the humiliating defeats. They were very well trained men but they were trained in benevolent conditions of France and Germany. This time, the battle was in completely different and extreme conditions, and they could not cope with it.

The failure of credit risk models during the financial crisis 0f 2007-08 could be related to the fate of both the French and German armies. The models were built and trained in a benevolent economic environment and were ill-prepared to deal with extreme economic conditions at the time. Additionally, there were series of bad judgments by the executives at the financial firms that resulted in total economic collapse.

The moral of the above stories is that one has to keep a close tab on a change in conditions in the currently prevalent environment and training environment. The Basel III accord has paid a significant attention towards monitoring portfolio on a regular basis for a good reason. The  population stability index (PSI) is one such index that helps risk managers in performing this task for retail credit scorecards.

Population Stability Index (PSI) – Our Banking Case Continues

You are the chief-risk-officer at CyndiCat bank. It’s been a couple of years since your team, in your supervision, has built the auto-loans credit scorecard. Since then the overall risk assessment process for the bank has improved significantly. Though being a prudent risk manager you have asked your team to regularly compare the population for which the scorecard was built and the existing through-the-door population (applicants for auto loans). A good place to start this comparison is by checking how two populations are distributed across the risk bands created through the scorecard. The following is a representation for the latest quarterly comparison your team has performed against the benchmark sample. Here Actual %’ is the population distribution for the latest quarter and ‘Expected %’ is the population distribution for the validation sample (a.k.a. benchmark sample).

Comparing two populations visually is a good place to start. The current population seems to have shifted towards the right side of the graph. To a small extent, this is expected since scorecards often influence the through-the-door population as the market starts reacting to the approval strategies of the bank. However, the question we need to ask is whether this a major shift in the population? Essentially, you are comparing two different distributions and could use any goodness-of-fit measure such as Chi-square test. However, the population stability index is an industry-accepted metric that presents some convenient rules of thumb for the same. The population stability index (PSI) formula is displayed below (refer to ‘Credit Risk Scorecards’ by Naeem Siddiqui)

Again like the weight of evidence and the information value, PSI seems to have it’s root in information theory. Let’s calculate the population stability index (PSI) for our population (we have already seen a histogram for this above).

Score bands Actual % Expected % Ac-Ex ln(Ac/Ex) Index
< 251 5% 8% -3% -0.47 0.014
251–290 6% 9% -3% -0.41 0.012
291–320 6% 10% -4% -0.51 0.020
321–350 8% 13% -5% -0.49 0.024
351–380 10% 12% -2% -0.18 0.004
381–410 12% 11% 1% 0.09 0.001
411–440 14% 10% 4% 0.34 0.013
441–470 14% 9% 5% 0.44 0.022
471–520 13% 9% 4% 0.37 0.015
520 < 9% 8% 1% 0.12 0.001
  Population Stability Index (PSI)= 0.1269

The last column in the above table is what we care for. Let us consider the score band 251-290 and calculate the index value for this row.

The final value for the PSI i.e. 0.13 is the sum of all the values of the last column. Now the question is how to interpret this value? The rule of thumb for the PSI is displayed below

PSI Value Inference Action
Less than 0.1 Insignificant change No action required
0.1 – 0.25 Some minor change Check other scorecard monitoring metrics
Greater than 0.25 Major shift in population Need to delve deeper

The value of 0.13 falls in the second bucket which indicates a minor shift in population from the validation or benchmark sample. These are handy rules to have. However, one must ask, how is this population shift going to make any difference in the scorecard? Actually, it may or may not make any difference. Each score band of a scorecard has an associated bad rate or probability of customers not paying off their loans.  For instance, score band 251-290 in our scorecard has a bad rate of 10% or one customer out of the population of 10 in this score band won’t service his/her loan. The population stability index simply indicates changes in the population of loan applicants. However, this may or may not result in deterioration in performance of the scorecard to predict risk. Nevertheless, the PSI indicates changes in the environment which need to be further investigated through analyzing the change in macroeconomic conditions and overall lending policies of the bank.

Sign-off Note

The population stability index is one of the metrics to keep a check on changing conditions – however, the idea is clear that one has to capture robust metrics to keep a close look on the ever changing economic winds to prevent a crash landing. On the other side, Russian winters did change the history of the planet for better – I guess change is not always for bad.

This was a bit of a detour from our previous article on books to learn probability and Bayesian statistics. Hopefully, you have got a chance to check out some of the books mentioned in the earlier article, see you soon with the second part of that article.

博主的Python视频教学中心: 

PSi-Population Stability Index (PSI)的更多相关文章

  1. 模型稳定度指标PSI与IV

    由于模型是以特定时期的样本所开发的,此模型是否适用于开发样本之外的族群,必须经过稳定性测试才能得知.稳定度指标(population stability index ,PSI)可衡量测试样本及模型开发 ...

  2. 模型稳定性指标—PSI

    由于模型是以特定时期的样本所开发的,此模型是否适用于开发样本之外的族群,必须经过稳定性测试才能得知.稳定度指标(population stability index ,PSI)可衡量测试样本及模型开发 ...

  3. 【转】风控中的特征评价指标(二)——PSI

    转自:https://zhuanlan.zhihu.com/p/79682292 风控业务背景 在风控中,稳定性压倒一切.原因在于,一套风控模型正式上线运行后往往需要很久(通常一年以上)才会被替换下线 ...

  4. SQL->Python->PySpark计算KS,AUC及PSI

    KS,AUC 和 PSI 是风控算法中最常计算的几个指标,本文记录了多种工具计算这些指标的方法. 生成本文的测试数据: import pandas as pd import numpy as np i ...

  5. python机器学习-sklearn挖掘乳腺癌细胞(一)

    python机器学习-sklearn挖掘乳腺癌细胞( 博主亲自录制) 网易云观看地址 https://study.163.com/course/introduction.htm?courseId=10 ...

  6. Data Visualization – Banking Case Study Example (Part 1-6)

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  7. Scoring and Modeling—— Underwriting and Loan Approval Process

    https://www.fdic.gov/regulations/examinations/credit_card/ch8.html Types of Scoring FICO Scores    V ...

  8. Deformity ASP/ASPX Webshell、Webshell Hidden Learning

    catalog . Active Server Page(ASP) . ASP.NET . ASP WEBSHELL变形方式 . ASPX WEBSHELL变形方式 . webshell中常见的编码转 ...

  9. C#--API

    C#中调用API 介绍 API( Application Programming Interface ),我想大家不会陌生,它是我们Windows编程的常客,虽然基于.Net平台的C#有了强大的类库, ...

随机推荐

  1. 鼠标事件-MouseEvent

    当鼠标进行某种操作时,就会生成一个event对象,该对象记录着鼠标触发事件时的所有属性. 可以通过如下方法在google控制台打印出 MouseEvent 对象. function mouseDown ...

  2. BZOJ5063旅游——非旋转treap

    题目描述 小奇成功打开了大科学家的电脑. 大科学家打算前往n处景点旅游,他用一个序列来维护它们之间的顺序.初 始时,序列为1,2,...,n. 接着,大科学家进行m次操作来打乱顺序.每次操作有6步: ...

  3. Codeforces734 E. Anton and Tree

    传送门:>Here< 题意:给出一颗树,节点不是黑色就是白色,每次可以将一个颜色相同的块变颜色,问最少变几次才能让其变为同色 解题思路: 我们考虑由于每一次都是把同样颜色的色块进行变色,因 ...

  4. 卢卡斯定理&扩展卢卡斯定理

    卢卡斯定理 求\(C_m^n~mod~p\) 设\(m={a_0}^{p_0}+{a_1}^{p_1}+\cdots+{a_k}^{p_k},n={b_0}^{p_0}+{b_1}^{p_1}+\cd ...

  5. Gulp 新手使用

    Gulp 注意:gulp依赖于nodejs,在安装前要确保已经安装node环境,如为安装查看<windows系统下nodejs安装及环境配置>安装node环境. 1.全局安装 在命令行执行 ...

  6. Tarjan总结(缩点+割点(边)+双联通+LCA+相关模板)

    Tarjan求强连通分量 先来一波定义 强连通:有向图中A点可以到达B点,B点可以到达A点,则称为强连通 强连通分量:有向图的一个子图中,任意两个点可以相互到达,则称当前子图为图的强连通分量 强连通图 ...

  7. MT【189】二次条件配方

    “当一幢建筑物完成时,应该把脚手架拆除干净.”——高斯 (2017北大特优)若对任意使得关于 \(x\) 的方程 \(ax^2+bx+c=0\)(\(ac\ne 0\))有实数解的 \(a,b,c\) ...

  8. 「2017 Multi-University Training Contest 8」2017多校训练8

    1009 I am your Father! (最小树形图-朱刘算法) 题目链接 HDU6141 I am your Father! 求有向图最大生成树,要求n的父节点尽量小. 我们将所有wi变为-w ...

  9. LOJ2255. 「SNOI2017」炸弹 (线段树)

    本文为线段树做法 (听说可以tarjan缩点+拓扑? 感觉差不多..而且这样看起来方便很多 找到左端点的过程可以看作 点 -> 区间内lowerbound最小的点 -> lowerboun ...

  10. 【APIO2018】铁人两项(圆方树,动态规划)

    [APIO2018]铁人两项(圆方树,动态规划) 题面 UOJ 洛谷 BZOJ 题解 嘤嘤嘤,APIO的时候把一个组合数写成阶乘了,然后这题的70多分没拿到 首先一棵树是很容易做的,随意指定起点终点就 ...