python，在信用评级中，计算KS statistic值

# -*- coding: utf-8 -*-

import pandas as pd

from sklearn.grid_search import GridSearchCV

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.utils import shuffle

import numpy as np

from sklearn import metrics

from sklearn.metrics import log_loss, recall_score, precision_score, accuracy_score,f1_score

from sklearn.metrics import roc_curve, precision_recall_curve, roc_auc_score

# from sklearn.model_selection import cross_val_score

import lightgbm

def ks_statistic(Y,Y_hat):

    data = {"Y":Y,"Y_hat":Y_hat}

    df = pd.DataFrame(data)

    bins = np.array([-0.1,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0])

    category = pd.cut(df["Y_hat"],bins=bins)

    category = category.sort_values()

    #max_index = len(np.unique(df["Y_hat"]))

    Y = df.ix[category.index,:]['Y']

    Y_hat = df.ix[category.index,:]['Y_hat']

    df2 = pd.concat([Y,Y_hat],axis=1)

    df3 = pd.pivot_table(df2,values = ['Y_hat'],index ='Y_hat',columns='Y',aggfunc=len,fill_value=0)

    df4 = np.cumsum(df3)

    df5 = df4/df4.iloc[:,1].max()

    ks = max(abs(df5.iloc[:,0] - df5.iloc[:,1]))

    return ks/len(bins)

df = pd.read_csv('DC_ALL_20170217.csv', header=0)

X = df[df.columns.drop(['user_id','overdue'])].fillna(-999)

# X = df[['count','time_stamp','credit_limit','credit_card_use_rate','credit_count_x','bank_count','sex','occupation','education','marriage','hukou']]

y = df['overdue']

train = X.head(55596)

test = X.tail(69495-55596)

train_label = y.head(55596).convert_objects(convert_numeric=True)

X_train, X_test, y_train, y_test = train_test_split(\

	 train.values,  train_label, test_size=0.2, random_state=42)

max_depth = 5

subsample=0.8

learning_rate=0.01

n_estimators=400

random_state=3

nthread=4

is_unbalance=True

objective ='binary'

LGBM = lightgbm.LGBMClassifier(max_depth=max_depth, learning_rate=learning_rate,

n_estimators=n_estimators, objective=objective,is_unbalance=is_unbalance, nthread=nthread,subsample=subsample)

LGBM.fit(X_train, y_train)

y_test_v = LGBM.predict(X_test)

y_test_p = LGBM.predict_proba(X_test)[:, 1]

print 'auc: ', roc_auc_score(y_test, y_test_p)

print 'log_loss: ', log_loss(y_test, y_test_p)

print 'precision: ', precision_score(y_test, y_test_v)

print 'recall: ', recall_score(y_test, y_test_v)

print 'accuracy: ', accuracy_score(y_test, y_test_v)

print 'f1_score: ', f1_score(y_test, y_test_v)

print 'ks_statistic: ', ks_statistic(y_test.values, y_test_v)

python，在信用评级中，计算KS statistic值的更多相关文章

[python] 使用scikit-learn工具计算文本TF-IDF值
在文本聚类.文本分类或者比较两个文档相似程度过程中,可能会涉及到TF-IDF值的计算.这里主要讲述基于Python的机器学习模块和开源工具:scikit-learn. 希望文章对你有所帮 ...
python 遍历字典中的键和值
#遍历字典中的所有键和值 zd1={"姓名":"张三","年龄":20,"性别":"女"} zd2= ...
关于Java中计算日期差值不准确问题
1.字符串日期相减如:2016-4-1,必须先将此字符串转成Date对象,并且, 格式必须为:yyyy—MM—dd HH:mm:ss. 如果不转就直接计算(2016-4-1)两个这样的日期,则误差 ...
python 将数组中取某一值的元素全部替换为其他元素的方法
这里的问题是在做House Price Prediction的时候遇到的,尝试对GarageArea做log转化,但是由于有些房子没有车库,所以GarageArea = 0,再通过log转化变成-in ...
服务器文档下载zip格式 SQL Server SQL分页查询 C#过滤html标签 EF 延时加载与死锁在JS方法中返回多个值的三种方法（转载） IEnumerable,ICollection,IList接口问题不吹不擂，你想要的Python面试都在这里了【315+道题】基于mvc三层架构和ajax技术实现最简单的文件上传事件管理
服务器文档下载zip格式刚好这次项目中遇到了这个东西,就来弄一下,挺简单的,但是前台调用的时候弄错了,浪费了大半天的时间,本人也是菜鸟一枚.开始吧.(MVC的) @using Rattan.Co ...
Python实现计算圆周率π的值到任意位的方法示例
Python实现计算圆周率π的值到任意位的方法示例本文实例讲述了Python实现计算圆周率π的值到任意位的方法.分享给大家供大家参考,具体如下: 一.需求分析输入想要计算到小数点后的位数,计算圆周 ...
计算KS值的标准代码
计算KS值的标准代码 from scipy.stats import ks_2samp get_ks = lambda y_pred,y_true: ks_2samp(y_pred[y_true==1 ...
Python学习第六篇——字典中的键和值
favorite_language ={ "jen":"python", "sarah":"c", "edwa ...
【381】python 获取列表中重复元素的索引值
参考:获取python的list中含有重复值的index方法_python_脚本之家核心思想:建立字典,遍历列表,把列表中每个元素和其索引添加到字典里面 cc = [1, 2, 3, 2, 4] f ...

随机推荐

Leetcode122-Best Time to Buy and Sell Stock II-Easy
Say you have an array for which the ith element is the price of a given stock on day i. Design an al ...
JavaScript重点知识（一）
一.总括基础知识: 1.变量 2.原型和原型链 3.作用域和闭包 4.异步和单线程 JS的API: 1.BOM,DOM操作 2.事件绑定 3.Ajax 4.JSOP 5.存储二.基础知识 2.1知 ...
Codeforces 765 E. Tree Folding
题目链接:http://codeforces.com/problemset/problem/765/E $DFS子$树进行$DP$ 大概分以下几种情况: 1.为叶子,直接返回. 2.长度不同的路径长度 ...
vs2013 报错error C1083: 无法打开包括文件:“gl\glew.h”: No such file or directory\
vs报错诸如如无法打开“gl\xxx.h”时, 解决方法: 1.去http://glew.sourceforge.net/下载相关文件,2.在下载下来的文件里找到xxx.h,将其复制到vs的相关目录下 ...
javaSE习题第一章 JAVA语言概述
转眼就开学了,正式在学校学习SE部分,由于暑假放视频过了一遍,略感觉轻松,今天开始,博客将会记录我的课本习题,主要以文字和代码的形式展现,一是把SE基础加强一下,二是课本中有很多知识是视频中没有的,做 ...
vue中的slot插槽
插槽,也就是slot,是组件的一块HTML模板,这块模板显示不显示.以及怎样显示由父组件来决定. 实际上,一个slot最核心的两个问题这里就点出来了,是显示不显示和怎样显示. 1.navigation ...
List、Map、Set的区别与联系
重复和有序 List 存储的元素是有顺序的,并且值允许重复: Map 元素按键值对存储,无放入顺序 ,它的键是不允许重复的,但是值是允许重复的: Set 存储的元素是无顺序的,并且不允许重复,元素虽然 ...
加速cin的技巧
ios::sync_with_stdio(false); cin.tie(0); 把cin变得和scanf一样快.
Python3 - MySQL适配器 PyMySQL
本文我们为大家介绍 Python3 使用 PyMySQL 连接数据库,并实现简单的增删改查. 什么是 PyMySQL? PyMySQL 是在 Python3.x 版本中用于连接 MySQL 服务器的一 ...
Java操作Kafka
java操作kafka非常的简单,然后kafka也提供了很多缺省值,一般情况下我们不需要修改太多的参数就能使用.下面我贴出代码. pom.xml <dependency> <grou ...

python， 在信用评级中，计算KS statistic值

python， 在信用评级中，计算KS statistic值的更多相关文章

随机推荐

热门专题

python，在信用评级中，计算KS statistic值

python，在信用评级中，计算KS statistic值的更多相关文章