python信用评分卡(附代码,博主录制)

统计和数据挖掘中分类问题

Classification Problem in Statistics & Data Mining

I must say I was shocked when Amishi, a girl little over three years old, announced that going forward she is only friends with my wife and not me. Her reason for the breakup was that I am a boy and girls can only be friends with girls. She has learned this social norm from her friends at the preschool. I still remember the way she modeled for me in her swimsuit and umbrella just a few months ago. She was aware of the boy-girl difference even then, it is just she has learned this weird social norm now. The point over here is that toddlers can distinguish genders without much effort. Nature has given us a built-in equation to classify gender through a mere glance with a high degree of precision. Imagine a similar mechanism to distinguish between good and bad borrowers. You are talking about every banker’s dream. However, evolution has trained us to mate not to lend.

我必须说,当三十岁的女孩Amishi宣布前进时,她只是与我的妻子而不是我的朋友,我感到震惊。 分手的原因是我是男孩,女孩只能是女孩的朋友。 她从幼儿园的朋友那里学到了这种社会规范。 几个月前,我还记得她在泳衣和雨伞中为我塑造的方式。 即便如此,她也意识到了男女之间的差异,现在只是她已经学会了这种奇怪的社会规范。 这里的重点是,幼儿可以毫不费力地区分性别。 大自然给了我们一个内置的方程式,通过高度精确的一瞥来对性别进行分类。 想象一下类似的机制来区分好的和坏的借款人。 你在谈论每个银行家的梦想。 然而,进化训练我们交配不放贷。

Predictive Analytics: Classification Problem – by Roopam

As I have mentioned in the previous article, scorecards have their roots in the classification problem in statistics and data mining. The idea with most classification problems is to create a mathematical equation to distinguish dichotomous variables. These variables can only take two values such as

• Male/ Female
• Good / Bad
• Yes / No
• God / Devil
• Happy / Sad
• Sales / No Sales

The list can go on until eternity. The reason why most business problems try to model dichotomies is that it is easy to comprehend for us humans. We must appreciate that dichotomies are  never absolute and have degrees attached to them. For example, I am 80% good and 20% bad – at least I would like to believe this. I shall keep Pareto’s 80-20 principle away from this i.e. my 20% bad is responsible for my 80% of behavior.

正如我在上一篇文章中提到的,记分卡的根源在于统计和数据挖掘中的分类问题。 大多数分类问题的想法是创建一个数学方程来区分二分变量。 这些变量只能采用两个值,例如

•男/女
• 好坏
•是/否
•上帝/魔鬼
•快乐/悲伤
•销售/无销售

这份清单可以持续到永恒。 大多数商业问题试图模拟二分法的原因是它很容易理解我们人类。 我们必须明白,二分法从来都不是绝对的,是有度的。 例如,我80%好,20%坏 - 至少我想相信这一点。 我将保持帕累托的80-20原则远离这一点,即我的20%不好对我80%的行为负责。

Credit Scorecards Development – Problem Statement & Sampling(坏客户定义是灵活的)

In the case of credit scorecards, the problem statement is to distinguish analytically between the good and bad borrowers. Hence, the first task is to define a good and a bad borrower. For most loan products, good and bad credit is defined in the following way

1. Good loan: never or once missed on the EMI payment
2. Bad loan: ever missed 3 consecutive EMIs in a row (i.e. 90 days-past-due)

Additionally, for tagging someone good or bad, you need to observe his or her behavior for a significant length of time. This length of time varies from product to product based on the tenor of the loan. For home loans, with a tenor of 20 years, 2-3 years is a reasonable observation period.
However, there is nothing sacrosanct about the above definition and can be modified at the discretion of the analyst. Roll-rate analysis and vintage analysis are the two analytical tools you may want to consider while constructing the above definition.

信用记分卡开发 - 问题陈述和抽样
在信用记分卡的情况下,问题陈述是在好的和坏的借款人之间进行分析。因此,第一个任务是定义一个好的和坏的借款人。对于大多数贷款产品,信用良好和不良以下列方式定义

1.良好的贷款:永远或曾一次逾期
2.不良贷款:连续3次错过EMI(即90天过期)

此外,为了标记好人或坏人,你需要在很长一段时间内观察他或她的行为。根据贷款期限,这段时间因产品而异。对于房屋贷款,期限为20年,2 - 3年是合理的观察期。
但是,对于上述定义没有什么神圣不可侵犯的,可以由分析师自行决定修改。滚动率分析和复古分析是您在构建上述定义时可能需要考虑的两种分析工具。

Sampling Strategy for Credit Scorecards

A few years ago, I did a daylong workshop on Statistical Inference for a large German shipping & cargo company in Mumbai. At the time of Q&A session the Vice President of operations asked a tricky question, what is a good sample size to achieve good precision? He was looking for a one-size-fits-all answer and I wish it were that simple. The sample size depends on the degree of similarity or homogeneity of the population in question. For example, what do you think is a good sample size to answer the following two questions?

1. What is the salinity of the Pacific Ocean?
2. Is there another planet with intelligent life in the Universe?

In terms of population size, a number of drops in the ocean and planets in the Universe is similar. A couple of drops of water are enough to answer the first question since the salinity of oceans is fairly constant. On the other hand, the second question is a black swan problem. You may need to visit every single planet to rule our possibility of an intelligent form of life.

For credit scorecard development, the accepted rule of thumb for sample size is at least 1000 records of both good and bad loans. There is no reason why you cannot build a scorecard with a smaller sample size (say 500 records). However, the analyst needs to be cautious in doing so because a higher degree of randomness creeps in a small data sample. Additionally, it is also advisable to keep the sample window as short as possible i.e. a financial quarter or two while scorecard development. Further, the sample is divided into two pieces – usually, 70 % for development and remaining for validation sample. We discuss the development and validation sample in detail in the subsequent sections of this series.

信用记分卡的抽样策略
几年前,我为孟买的一家大型德国航运和货运公司举办了为期一天的统计推断研讨会。在问答环节时,运营副总裁提出了一个棘手的问题,即获得良好精度的样本量是多少?他正在寻找一个通用的答案,我希望它很简单。样本量取决于所讨论的群体的相似程度或同质性。例如,您认为回答以下两个问题的样本量是多少?

1.太平洋的盐度是多少?
2.宇宙中还有另一个拥有智慧生命的星球吗?

就人口规模而言,宇宙中海洋和行星的数量下降是相似的。由于海洋的盐度相当稳定,几滴水足以回答第一个问题。另一方面,第二个问题是黑天鹅问题。您可能需要访问每个星球来统治我们生活的智能生活的可能性。

对于信用记分卡开发,样本大小的公认经验法则是至少1000个好的和坏的贷款记录。没有理由不能建立样本量较小的记分卡(比如500条记录)。但是,分析师需要谨慎行事,因为较小程度的随机性会在小数据样本中蔓延。此外,还建议尽可能缩短样本窗口,即在记分卡开发时用一个或两个季度数据。此外,样品分为两部分 - 通常70%用于显影,剩余用于验证样品。我们将在本系列的后续章节中详细讨论开发和验证示例。

Credit Scorecard Development: Sampling Strategy – by Roopam

Sign-off Note

In the next article, we will discuss an important topic of variables classing and coarse classing for credit scorecards. See you soon

博主的Python视频教学中心: https://m.study.163.com/user/1135726305.htm?utm_campaign=share&utm_medium=iphoneShare&utm_source=weixin&utm_u=1015941113

信用评分卡 (part 2of 7)的更多相关文章

  1. 信用评分卡(A卡/B卡/C卡)的模型简介及开发流程|干货

    https://blog.csdn.net/varyall/article/details/81173326 如今在银行.消费金融公司等各种贷款业务机构,普遍使用信用评分,对客户实行打分制,以期对客户 ...

  2. 信用评分卡 (part 7 of 7)

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  3. 信用评分卡 (part 6 of 7)

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  4. 信用评分卡 (part 5 of 7)

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  5. 信用评分卡 (part 4 of 7)

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  6. 信用评分卡 (part 3of 7)

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  7. 信用评分卡 (part 1 of 7)

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  8. 信用评分卡Credit Scorecards (1-7)

      欢迎关注博主主页,学习python视频资源,还有大量免费python经典文章 python风控评分卡建模和风控常识 https://study.163.com/course/introductio ...

  9. python德国信用评分卡建模(附代码AAA推荐)

    欢迎关注博主主页,学习python视频资源,还有大量免费python经典文章 python信用评分卡建模视频系列教程(附代码)  博主录制 https://study.163.com/course/i ...

随机推荐

  1. pycharm中查看源码的快捷键

    将光标移动至要查看的方法处,按住ctrl  点击鼠标左键,即可查看该方法的源码

  2. python 脚本之 获取远程主机的hostname

    import sys, socket try: result = socket.gethostbyaddr("查询的IP") #查询完后获得一个元组 print (result) ...

  3. BZOJ3165[Heoi2013]Segment——李超线段树

    题目描述 要求在平面直角坐标系下维护两个操作: 1.在平面上加入一条线段.记第i条被插入的线段的标号为i. 2.给定一个数k,询问与直线 x = k相交的线段中,交点最靠上的线段的编号. 输入 第一行 ...

  4. Win10下创建Python3.7创建虚拟环境以及安装Flask框架

    鉴于现在看到的很多虚拟环境创建以及flask框架安装方式需要通过dos命令来做,虽然比较常用,但是每次运行都要激活虚拟环境,相对比较麻烦,而现在利用pycharm大可不必如此. 1.安装破解版pych ...

  5. SQLSERVER 维护计划无法删除

    数据对网站运营或者企业运营是至关重要的,所以,我们在使用数据库的时候,为了保证数据的安全可靠性,都会做数据库备份,很显然,这个备份,我们不可能每天都去手动备份,SQLServer 数据库就可以提供数据 ...

  6. 基准对象object中的基础类型----数字 (二)

    object有如下子类: CLASSES object basestring str unicode buffer bytearray classmethod complex dict enumera ...

  7. 大学jsp实验3include指令的使用

    1.include指令的使用 (1)编写一个名为includeCopyRight.jsp的页面,页面的浏览效果如下: 要求“2016”这个值可以实现动态更新.请写出页面代码: <%@ page ...

  8. linux下后台启动springboot项目

    linux下后台启动springboot项目 我们知道启动springboot的项目有三种方式: 运行主方法启动 使用命令 mvn spring-boot:run”在命令行启动该应用 运行“mvn p ...

  9. 一个有关FWT&FMT的东西

    这篇文章在讲什么 相信大家都会FWT和FMT. 如果你不会,推荐你去看一下VFK的2015国家集训队论文. 设全集为\(U=\{1,2,\ldots,n\}\),假设我们关心的\(f_S\)中的集合\ ...

  10. Tomcat控制台总是打印日志问题的解决办法

    问题 使用gradle启动项目,在tomcat控制台中不停地打印perf4j性能日志,导致开发过程很卡很慢.明明修改了logback.xml配置文件,让它输出到log文件中,而不是控制台,但是不起作用 ...