Logistic Regression：银行贷款申请审批实例

问题定义

这是一个贷款的审批问题，假设你是一个银行的贷款审批员，现在有客户需要一定额度的贷款，他们填写了个人的信息（信息在datas.txt中给出），你需要根据他们的信息，建立一个分类模型，判断是否可以给他们贷款。

请根据所给的信息，建立分类模型，评价模型，同时将模型建立过程简单介绍一下，同时对各特征进行简单的解释说明。

Dataset

用户id，年龄，性别，申请金额，职业类型，教育程度，婚姻状态，房屋类型，户口类型，贷款用途，公司类型，薪水，贷款标记：0不放贷，1同意放贷

Data preprocessing

在对数据进行建模时，用户ID是没有用的。在描述用户信息的几个维度数据中，年龄，申请金额，薪水是连续值，剩下的是离散值。

通过观察发现有些数据存在数据缺失的情况，需要对这些数据进行处理，比如直接删除或者通过缺失值补全。

The Logit Function

The Logistic Regression

Model Data

 #逻辑回归模型

 #对银行客户是否放贷进行分类

 import pandas

 import numpy

 import matplotlib.pyplot as plt

 from sklearn.linear_model import  LogisticRegression

 from sklearn.metrics import roc_curve, roc_auc_score

 data = pandas.read_csv("datas.csv")

 data = data.dropna()

 # Randomly shuffle our data for the training and test set

 admissions = data.loc[numpy.random.permutation(data.index)]

 # train with 700 and test with the following 300, split dataset

 num_train = 14968

 data_train = admissions[:num_train]

 data_test = admissions[num_train:]

 # Fit Logistic regression to admit with features using the training set

 logistic_model = LogisticRegression()

 logistic_model.fit(data_train[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']], data_train['Label'])

 # Print the Models Coefficients

 print(logistic_model.coef_)

 # .predict() using a threshold of 0.50 by default

 predicted = logistic_model.predict(data_train[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']])

 # The average of the binary array will give us the accuracy

 accuracy_train = (predicted == data_train['Label']).mean()

 # Print the accuracy

 print("Accuracy in Training Set = {s}".format(s=accuracy_train))

 # Predicted to be admitted

 predicted = logistic_model.predict(data_test[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']])

 # What proportion of our predictions were true

 accuracy_test = (predicted == data_test['Label']).mean()

 print("Accuracy in Test Set = {s}".format(s=accuracy_test))

 # Predict the chance of label from those in the training set

 train_probs = logistic_model.predict_proba(data_train[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']])[:,1]

 test_probs = logistic_model.predict_proba(data_test[['Age','Gender','AppAmount','Occupation',

                                'Education','Marital','Property','Residence',

                                'LoanUse','Company','Salary']])[:,1]

 # Compute auc for training set

 auc_train = roc_auc_score(data_train["Label"], train_probs)

 # Compute auc for test set

 auc_test = roc_auc_score(data_test["Label"], test_probs)

 # Difference in auc values

 auc_diff = auc_train - auc_test

 # Compute ROC Curves

 roc_train = roc_curve(data_train["Label"], train_probs)

 roc_test = roc_curve(data_test["Label"], test_probs)

 # Plot false positives by true positives

 plt.plot(roc_train[0], roc_train[1])

 plt.plot(roc_test[0], roc_test[1])

Logistic Regression：银行贷款申请审批实例的更多相关文章

Logistic Regression vs Decision Trees vs SVM: Part II
This is the 2nd part of the series. Read the first part here: Logistic Regression Vs Decision Trees ...
统计学习方法笔记 Logistic regression
logistic distribution 设X是连续随机变量,X服从逻辑斯谛分布是指X具有下列分布函数和密度函数: 式中,μ为位置参数,γ>0为形状参数. 密度函数是脉冲函数分布函数是一条S ...
Machine Learning - 第3周（Logistic Regression、Regularization）
Logistic regression is a method for classifying data into discrete outcomes. For example, we might u ...
【机器学习】Octave 实现逻辑回归 Logistic Regression
ex2data1.txt ex2data2.txt 本次算法的背景是,假如你是一个大学的管理者,你需要根据学生之前的成绩(两门科目)来预测该学生是否能进入该大学. 根据题意,我们不难分辨出这是一种二分 ...
机器学习之LinearRegression与Logistic Regression逻辑斯蒂回归(三)
一评价尺度 sklearn包含四种评价尺度 1 均方差(mean-squared-error) 2 平均绝对值误差(mean_absolute_error) 3 可释方差得分(explained_v ...
Python机器学习算法 — 逻辑回归（Logistic Regression）
逻辑回归--简介逻辑回归(Logistic Regression)就是这样的一个过程:面对一个回归或者分类问题,建立代价函数,然后通过优化方法迭代求解出最优的模型参数,然后测试验证我们这个求解的模型 ...
通俗地说逻辑回归【Logistic regression】算法（二）sklearn逻辑回归实战
前情提要: 通俗地说逻辑回归[Logistic regression]算法(一) 逻辑回归模型原理介绍上一篇主要介绍了逻辑回归中,相对理论化的知识,这次主要是对上篇做一点点补充,以及介绍sklear ...
机器学习-非线性回归(Logistic Regression)及应用
1. 概率 1.1 定义:概率(Probability):对一件事情发生的可能性的衡量. 1.2 范围:0 <= P <= 1 1.3 计算方法: 1.3.1 根据个人置信 1.3.2 根 ...
机器学习——逻辑回归（Logistic Regression）
1 前言虽然该机器学习算法名字里面有"回归",但是它其实是个分类算法.取名逻辑回归主要是因为是从线性回归转变而来的. logistic回归,又叫对数几率回归. 2 回归模型 2. ...

随机推荐

python之实现缓存环
看了CodeBokk 第二版通过python实现缓存环,吸收之后记录下,方便以后查阅. 任务: 定义一个固定尺寸的缓存,当它填满的时候,新加入的元素会覆盖第一个(最老的)元素.这种数据结构在存储日志和 ...
moodle其他代码
, $sectionnum=false, $strictness=IGNORE_MISSING):给课程模块一个id,找出coursemoudle的描述 get_coursemodule_from_i ...
使用Jax-rs 开发RESTfull API 入门
使用Jax-rs 开发RESTfull API 入门本文使用 Jersey 2开发RESTfull API.Jersey 2 是 JAX-RS 接口的参考实现使用到的工具 Eclipse Neon ...
Thymeleaf 3与Spring MVC 4 整合配置
Thymeleaf 3与Spring MVC 4 整合配置 Maven 依赖配置 Spring 相关依赖就不说了 <dependency> <groupId>org.thyme ...
解决IE兼容总汇【转】
转载声明: 藏羚羊 2014年04月16日于前端开拓者发表本文固定链接: http://www.frontopen.com/2552.html 1. <meta http-equiv=“ ...
类型“System.Data.Objects.DataClasses.EntityObject”在未被引用的程序集中定义。
说明: 在编译向该请求提供服务所需资源的过程中出现错误.请检查下列特定错误详细信息并适当地修改源代码. 编译器错误消息: CS0012: 类型“System.Data.Objects.DataClas ...
【NOIP2013提高组】货车运输
货车运输 (truck.cpp/c/pas) [问题描述] A国有n座城市,编号从1到n,城市之间有m条双向道路.每一条道路对车辆都有重量限制,简称限重.现在有q辆货车在运输货物,司机们想知道每辆 ...
Android抓包方法
0. Fiddler代理 1.tcpdump命令+wireshark工具 adb shell #登入手机 su #切换Root用户 /data/local/tcpdump -p ...
springmvc配置aop
直接看代码 springmvc中的配置aop对 controller和它的子包进行拦截 springmvc中的配置  xmlns:aop=&qu ...
jarring type lambda
object IntStateMonad extendsMonad[({type IntState[A] = State[Int, A]})#IntState] {...}This syntax ca ...