1. 加载数据
  2. > w<-read.table("test.prn",header = T)
  3. > w
  4. X.. X...1
  5. 1 A 2
  6. 2 B 3
  7. 3 C 5
  8. 4 D 5
  9. > library(readxl)
  10. > dat<-read_excel("test.xlsx")
  11. > dat
  12. # A tibble: 4 x 2
  13. `商品` `价格`
  14. <chr> <dbl>
  15. 1 A 2
  16. 2 B 3
  17. 3 C 5
  18. 4 D 5
  19. > bank=read.table("bank-full.csv",header = TRUE,sep=",")
  20. 查看数据结构
  21. > str(bank)
  22. 'data.frame': 41188 obs. of 21 variables:
  23. $ age : int 56 57 37 40 56 45 59 41 24 25 ...
  24. $ job : Factor w/ 12 levels "admin.","blue-collar",..: 4 8 8 1 8 8 1 2 10 8 ...
  25. $ marital : Factor w/ 4 levels "divorced","married",..: 2 2 2 2 2 2 2 2 3 3 ...
  26. $ education : Factor w/ 8 levels "basic.4y","basic.6y",..: 1 4 4 2 4 3 6 8 6 4 ...
  27. $ default : Factor w/ 3 levels "no","unknown",..: 1 2 1 1 1 2 1 2 1 1 ...
  28. $ housing : Factor w/ 3 levels "no","unknown",..: 1 1 3 1 1 1 1 1 3 3 ...
  29. $ loan : Factor w/ 3 levels "no","unknown",..: 1 1 1 1 3 1 1 1 1 1 ...
  30. $ contact : Factor w/ 2 levels "cellular","telephone": 2 2 2 2 2 2 2 2 2 2 ...
  31. $ month : Factor w/ 10 levels "apr","aug","dec",..: 7 7 7 7 7 7 7 7 7 7 ...
  32. $ day_of_week : Factor w/ 5 levels "fri","mon","thu",..: 2 2 2 2 2 2 2 2 2 2 ...
  33. $ duration : int 261 149 226 151 307 198 139 217 380 50 ...
  34. $ campaign : int 1 1 1 1 1 1 1 1 1 1 ...
  35. $ pdays : int 999 999 999 999 999 999 999 999 999 999 ...
  36. $ previous : int 0 0 0 0 0 0 0 0 0 0 ...
  37. $ poutcome : Factor w/ 3 levels "failure","nonexistent",..: 2 2 2 2 2 2 2 2 2 2 ...
  38. $ emp.var.rate : num 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
  39. $ cons.price.idx: num 94 94 94 94 94 ...
  40. $ cons.conf.idx : num -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 ...
  41. $ euribor3m : num 4.86 4.86 4.86 4.86 4.86 ...
  42. $ nr.employed : num 5191 5191 5191 5191 5191 ...
  43. $ y : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
  44. 查看数据的最小值,最大值,中位数,平均数,分位数
  45. > summary(bank)
  46. age job marital
  47. Min. :17.00 admin. :10422 divorced: 4612
  48. 1st Qu.:32.00 blue-collar: 9254 married :24928
  49. Median :38.00 technician : 6743 single :11568
  50. Mean :40.02 services : 3969 unknown : 80
  51. 3rd Qu.:47.00 management : 2924
  52. Max. :98.00 retired : 1720
  53. (Other) : 6156
  54. education default housing
  55. university.degree :12168 no :32588 no :18622
  56. high.school : 9515 unknown: 8597 unknown: 990
  57. basic.9y : 6045 yes : 3 yes :21576
  58. professional.course: 5243
  59. basic.4y : 4176
  60. basic.6y : 2292
  61. (Other) : 1749
  62. loan contact month day_of_week
  63. no :33950 cellular :26144 may :13769 fri:7827
  64. unknown: 990 telephone:15044 jul : 7174 mon:8514
  65. yes : 6248 aug : 6178 thu:8623
  66. jun : 5318 tue:8090
  67. nov : 4101 wed:8134
  68. apr : 2632
  69. (Other): 2016
  70. duration campaign pdays
  71. Min. : 0.0 Min. : 1.000 Min. : 0.0
  72. 1st Qu.: 102.0 1st Qu.: 1.000 1st Qu.:999.0
  73. Median : 180.0 Median : 2.000 Median :999.0
  74. Mean : 258.3 Mean : 2.568 Mean :962.5
  75. 3rd Qu.: 319.0 3rd Qu.: 3.000 3rd Qu.:999.0
  76. Max. :4918.0 Max. :56.000 Max. :999.0
  77.  
  78. previous poutcome emp.var.rate
  79. Min. :0.000 failure : 4252 Min. :-3.40000
  80. 1st Qu.:0.000 nonexistent:35563 1st Qu.:-1.80000
  81. Median :0.000 success : 1373 Median : 1.10000
  82. Mean :0.173 Mean : 0.08189
  83. 3rd Qu.:0.000 3rd Qu.: 1.40000
  84. Max. :7.000 Max. : 1.40000
  85.  
  86. cons.price.idx cons.conf.idx euribor3m
  87. Min. :92.20 Min. :-50.8 Min. :0.634
  88. 1st Qu.:93.08 1st Qu.:-42.7 1st Qu.:1.344
  89. Median :93.75 Median :-41.8 Median :4.857
  90. Mean :93.58 Mean :-40.5 Mean :3.621
  91. 3rd Qu.:93.99 3rd Qu.:-36.4 3rd Qu.:4.961
  92. Max. :94.77 Max. :-26.9 Max. :5.045
  93.  
  94. nr.employed y
  95. Min. :4964 no :36548
  96. 1st Qu.:5099 yes: 4640
  97. Median :5191
  98. Mean :5167
  99. 3rd Qu.:5228
  100. Max. :5228
  101.  
  102. > psych::describe(bank)
  103. 方差 个数 平均值 标准差 均值 去掉最大 中位数 最小值 最大值 极差 偏差 峰度
  104. 绝对偏差
  105. 最小值
  106. 之后
  107. 的平均数
  108.  
  109. vars n mean sd median trimmed mad min max range skew kurtosis
  110. age 1 41188 40.02 10.42 38.00 39.30 10.38 17.00 98.00 81.00 0.78 0.79
  111. job* 2 41188 4.72 3.59 3.00 4.48 2.97 1.00 12.00 11.00 0.45 -1.39
  112. marital* 3 41188 2.17 0.61 2.00 2.21 0.00 1.00 4.00 3.00 -0.06 -0.34
  113. education* 4 41188 4.75 2.14 4.00 4.88 2.97 1.00 8.00 7.00 -0.24 -1.21
  114. default* 5 41188 1.21 0.41 1.00 1.14 0.00 1.00 3.00 2.00 1.44 0.07
  115. housing* 6 41188 2.07 0.99 3.00 2.09 0.00 1.00 3.00 2.00 -0.14 -1.95
  116. loan* 7 41188 1.33 0.72 1.00 1.16 0.00 1.00 3.00 2.00 1.82 1.38
  117. contact* 8 41188 1.37 0.48 1.00 1.33 0.00 1.00 2.00 1.00 0.56 -1.69
  118. month* 9 41188 5.23 2.32 5.00 5.31 2.97 1.00 10.00 9.00 -0.31 -1.03
  119. day_of_week* 10 41188 3.00 1.40 3.00 3.01 1.48 1.00 5.00 4.00 0.01 -1.27
  120. duration 11 41188 258.29 259.28 180.00 210.61 139.36 0.00 4918.00 4918.00 3.26 20.24
  121. campaign 12 41188 2.57 2.77 2.00 1.99 1.48 1.00 56.00 55.00 4.76 36.97
  122. pdays 13 41188 962.48 186.91 999.00 999.00 0.00 0.00 999.00 999.00 -4.92 22.23
  123. previous 14 41188 0.17 0.49 0.00 0.05 0.00 0.00 7.00 7.00 3.83 20.11
  124. poutcome* 15 41188 1.93 0.36 2.00 2.00 0.00 1.00 3.00 2.00 -0.88 3.98
  125. emp.var.rate 16 41188 0.08 1.57 1.10 0.27 0.44 -3.40 1.40 4.80 -0.72 -1.06
  126. cons.price.idx 17 41188 93.58 0.58 93.75 93.58 0.56 92.20 94.77 2.57 -0.23 -0.83
  127. cons.conf.idx 18 41188 -40.50 4.63 -41.80 -40.60 6.52 -50.80 -26.90 23.90 0.30 -0.36
  128. euribor3m 19 41188 3.62 1.73 4.86 3.81 0.16 0.63 5.04 4.41 -0.71 -1.41
  129. nr.employed 20 41188 5167.04 72.25 5191.00 5178.43 55.00 4963.60 5228.10 264.50 -1.04 0.00
  130. y* 21 41188 1.11 0.32 1.00 1.02 0.00 1.00 2.00 1.00 2.45 4.00
  131.  
  132. se
  133. age 0.05
  134. job* 0.02
  135. marital* 0.00
  136. education* 0.01
  137. default* 0.00
  138. housing* 0.00
  139. loan* 0.00
  140. contact* 0.00
  141. month* 0.01
  142. day_of_week* 0.01
  143. duration 1.28
  144. campaign 0.01
  145. pdays 0.92
  146. previous 0.00
  147. poutcome* 0.00
  148. emp.var.rate 0.01
  149. cons.price.idx 0.00
  150. cons.conf.idx 0.02
  151. euribor3m 0.01
  152. nr.employed 0.36
  153. y* 0.00
  154.  
  155. 查看数据是否有缺失值
  156. > sapply(bank,anyNA)
  157. age job marital education
  158. FALSE FALSE FALSE FALSE
  159. default housing loan contact
  160. FALSE FALSE FALSE FALSE
  161. month day_of_week duration campaign
  162. FALSE FALSE FALSE FALSE
  163. pdays previous poutcome emp.var.rate
  164. FALSE FALSE FALSE FALSE
  165. cons.price.idx cons.conf.idx euribor3m nr.employed
  166. FALSE FALSE FALSE FALSE
  167. y
  168. FALSE
  169.  
  170. 成功与不成功的个数
  171. > table(bank$y)
  172.  
  173. no yes
  174. 36548 4640
  175.  
  176. 在是否结婚这个属性的取值与
  177. 是否成功的数量比较
  178. > table(bank$y,bank$marital)
  179.  
  180. divorced married single unknown
  181. no 4136 22396 9948 68
  182. yes 476 2532 1620 12
  183.  
  184. > xtabs(~y+marital,data=bank)
  185. marital
  186. y divorced married single unknown
  187. no 4136 22396 9948 68
  188. yes 476 2532 1620 12
  189. > tab=table(bank$y,bank$marital)
  190. > tab
  191.  
  192. divorced married single unknown
  193. no 4136 22396 9948 68
  194. yes 476 2532 1620 12
  195.  
  196. 在是否结婚这个属性上的取值
  197. > margin.table(tab,2)
  198.  
  199. divorced married single unknown
  200. 4612 24928 11568 80
  201. > margin.table(tab,1)
  202.  
  203. no yes
  204. 36548 4640
  205.  
  206. 在是否结婚这个属性上横向看概率
  207. > prop.table(tab,1)
  208.  
  209. divorced married single unknown
  210. no 0.113166247 0.612783189 0.272189997 0.001860567
  211. yes 0.102586207 0.545689655 0.349137931 0.002586207
  212. 在是否结婚这个属性上纵向看概率
  213.  
  214. > prop.table(tab,2)
  215.  
  216. divorced married single unknown
  217. no 0.8967910 0.8984275 0.8599585 0.8500000
  218. yes 0.1032090 0.1015725 0.1400415 0.1500000
  219.  
  220. 平的列联表
  221. 以第一列和第二列,展开分类group by 1,2
  222. col.vars 的取值 进行次数统计
  223. > ftable(bank[,c(3,4,21)],row.vars = 1:2,col.vars = "y")
  224. y no yes
  225. marital education
  226. divorced basic.4y 406 83
  227. basic.6y 169 13
  228. basic.9y 534 31
  229. high.school 1086 107
  230. illiterate 1 1
  231. professional.course 596 61
  232. university.degree 1177 160
  233. unknown 167 20
  234. married basic.4y 2915 313
  235. basic.6y 1628 139
  236. basic.9y 3858 298
  237. high.school 4683 475
  238. illiterate 12 3
  239. professional.course 2799 357
  240. university.degree 5573 821
  241. unknown 928 126
  242. single basic.4y 422 31
  243. basic.6y 301 36
  244. basic.9y 1174 142
  245. high.school 2702 448
  246. illiterate 1 0
  247. professional.course 1247 177
  248. university.degree 3723 683
  249. unknown 378 103
  250. unknown basic.4y 5 1
  251. basic.6y 6 0
  252. basic.9y 6 2
  253. high.school 13 1
  254. illiterate 0 0
  255. professional.course 6 0
  256. university.degree 25 6
  257. unknown 7 2
  258.  
  259. 卡方检验,在p值小于2.2e-16时,拒绝原假设,认为数据不服从卡方分布
  260. > chisq.test(tab)
  261.  
  262. Pearson's Chi-squared test
  263.  
  264. data: tab
  265. X-squared = 122.66, df = 3, p-value < 2.2e-16
  266.  
  267. 画直方图
  268. > hist(bank$age)
  269. > library(lattice)
  270.  
  271. 画连续变量的分布,就是把直方图的中位数连接起来
  272. 以年龄为横轴,y为纵轴,数据是bank,画图,auto.key是否有图例
  273. > densityplot(~age,groups = y,data=bank,plot.point=FALSE,auto.key = TRUE)
  274.  
  275. 画Box图
  276. > boxplot(age~y,data=bank)
  277.  
  278. 双样本t分布检验,p值小于0.05时拒绝原假设
  279. 这里的原假设是两个样本没有相关性
  280. 得到的结果是p值为1.805e-06,拒绝两个样本没有相关性的假设
  281. 这里认为两个样本有相关性
  282. > t.test(age~y,data=bank,alternative="two.sided",var.equal=FALSE)
  283.  
  284. Welch Two Sample t-test
  285.  
  286. data: age by y
  287. t = -4.7795, df = 5258.5, p-value = 1.805e-06
  288. alternative hypothesis: true difference in means is not equal to 0
  289. 95 percent confidence interval:
  290. -1.4129336 -0.5909889
  291. sample estimates:
  292. mean in group no mean in group yes
  293. 39.91119 40.91315
  294.  
  295. 数据可视化
  296. 画饼图
  297. > tab=table(bank$marital)
  298. > pie(tab)
  299.  
  300. 画直方图
  301. > tab=table(bank$marital)
  302. > barplot(tab)
  303.  
  304. 画下面这个图
  305. > tab=table(bank$marital,bank$y)
  306. > plot(tab)
  307.  
  308. 画层叠直方图
  309. > tab=table(bank$marital,bank$y)
  310. > lattice::barchart(tab,auto.key=TRUE)
  311.  
  312. 加载这个包,准备画图
  313. > library(dplyr)
  314. > data=group_by(bank,marital,y)
  315. > data=tally(data)
  316. !!!!!!!!!!!!!
  317. > ggplot2::ggplot(data=data,mapping=aes(marital,n))+geom_bar(mapping=aes(fill=y),position="dodge",stat="identity")
  318.  
  319. 数据预处理
  320. 分组之后再画图
  321. > labels=c('青年','中年','老年')
  322. > bank$age_group=cut(bank$age,breaks = c(0,35,55,100),right = FALSE,labels = labels)
  323. > library(ggplot2)
  324. > ggplot(data=bank,mapping = aes(age_group))+geom_bar(mapping = aes(fill=y),position="dodge",stat="count")
  325.  
  326. 衍生变量
  327. 直接使用$符向原数据框添加新的变量
  328. > bank$log.cons.price.idx=log(bank$cons.price.idx)
  329. 使用transform函数向原数据框添加变量
  330. > bank<-transform(bank,log.cons.price.idx=log(cons.price.idx),log.nr.employed=log(nr.employed))
  331. 使用dplyr包里的mutate函数增加变量
  332. > bank<-dplyr::mutate(bank,log.cons.price.idx=log(cons.price.idx))
  333. 使用dplyr包里的transmute函数只保留新生成的变量
  334. > bank2<-dplyr::transmute(bank,log.cons.price.idx=log(cons.price.idx),log.nr.employed=log(nr.employed))
  335.  
  336. 中心化
  337.  
  338. > v=1:10
  339. > v1=v-mean(v)
  340. > v2=scale(v,center=TRUE,scale = FALSE)
  341.  
  342. 无量纲化
  343.  
  344. > V1=v/sqrt(sum(v^2)/(length(v)-1))
  345. > v2=scale(v,center=FALSE,scale=TRUE)
  346.  
  347. 根据最大最小值进行归一化
  348.  
  349. > v3=(v-min(v))/(max(v)-min(v))
  350.  
  351. 进行标准正态化
  352.  
  353. > v1=(v-mean(v))/sd(v)
  354. > v2=scale(v,center = TRUE,scale=TRUE)
  355.  
  356. Box-Cox变换
  357. 使用car包里的boxCox函数
  358. > install.packages("car")
  359. > library(car)
  360. > boxCox(age~.,data=bank)
  361.  
  362. 使用caret包,做Box-Cox变换
  363. > install.packages("caret")
  364. > library(caret)
  365. > dat<-subset(bank,select="age")
  366. > trans<-preProcess(dat,method=C("BoxCox"))
  367.  
  368. 数据预处理下
  369. 违反常识的异常值
  370. 基于数据分布的异常值(离群点)识别
  371. bank.dirty=read.csv("bank-dirty.csv")
  372. summary(bank.dirty)
  373.  
  374. age job marital education
  375. Min. : 17.00 admin. :10422 divorced: 4612 university.degree :12165
  376. 1st Qu.: 32.00 blue-collar: 9254 married :24928 high.school : 9515
  377. Median : 38.00 technician : 6743 single :11568 basic.9y : 6043
  378. Mean : 40.03 services : 3969 NA's : 80 professional.course: 5242
  379. 3rd Qu.: 47.00 management : 2924 basic.4y : 4175
  380. Max. :123.00 (Other) : 7546 (Other) : 2310
  381. NA's :2 NA's : 330 NA's : 1738
  382. default housing loan contact month
  383. no :32588 no :18622 no :33950 cellular :26144 may :13769
  384. yes : 3 yes :21576 yes : 6248 telephone:15044 jul : 7174
  385. NA's: 8597 NA's: 990 NA's: 990 aug : 6178
  386. jun : 5318
  387. nov : 4101
  388. apr : 2632
  389. (Other): 2016
  390. day_of_week duration campaign pdays previous
  391. fri:7827 Min. : 0.0 Min. : 1.000 Min. : 0.0 Min. :0.000
  392. mon:8514 1st Qu.: 102.0 1st Qu.: 1.000 1st Qu.:999.0 1st Qu.:0.000
  393. thu:8623 Median : 180.0 Median : 2.000 Median :999.0 Median :0.000
  394. tue:8090 Mean : 258.3 Mean : 2.568 Mean :962.5 Mean :0.173
  395. wed:8134 3rd Qu.: 319.0 3rd Qu.: 3.000 3rd Qu.:999.0 3rd Qu.:0.000
  396. Max. :4918.0 Max. :56.000 Max. :999.0 Max. :7.000
  397.  
  398. poutcome emp.var.rate cons.price.idx cons.conf.idx
  399. failure : 4252 Min. :-3.40000 Min. :92.20 Min. :-50.8
  400. nonexistent:35563 1st Qu.:-1.80000 1st Qu.:93.08 1st Qu.:-42.7
  401. success : 1373 Median : 1.10000 Median :93.75 Median :-41.8
  402. Mean : 0.08189 Mean :93.58 Mean :-40.5
  403. 3rd Qu.: 1.40000 3rd Qu.:93.99 3rd Qu.:-36.4
  404. Max. : 1.40000 Max. :94.77 Max. :-26.9
  405.  
  406. euribor3m nr.employed y
  407. Min. :0.634 Min. :4964 no :36548
  408. 1st Qu.:1.344 1st Qu.:5099 yes: 4640
  409. Median :4.857 Median :5191
  410. Mean :3.621 Mean :5167
  411. 3rd Qu.:4.961 3rd Qu.:5228
  412. Max. :5.045 Max. :5228
  413.  
  414. 常识告诉我们,虽然123岁的老人存在,但概率也极低,也不太可能是银行的客户
  415. 找出在年龄这一列的上离群值和下离群值
  416.  
  417. > head(bank.dirty[order(bank.dirty$age,decreasing = TRUE),'age',drop=FALSE],n=5)
  418. age
  419. 39494 123
  420. 38453 98
  421. 38456 98
  422. 27827 95
  423. 38922 94
  424. > tail(bank.dirty[order(bank.dirty$age,decreasing = TRUE),'age',drop=FALSE],n=5)
  425. age
  426. 37559 17
  427. 37580 17
  428. 38275 17
  429. 120 NA
  430. 156 NA
  431.  
  432. 异常值的处理
  433. 当作缺失值处理
  434. > bank.dirty$age[which(bank.dirty$age>98)]<-NA
  435. 删除或者插补
  436.  
  437. 重编码
  438. 职业类型有12个分类,不利于后续分析,把除了unknown以外的分类进行重新编码,简化成4
  439. Month12个分类,把它转化成季度
  440. Education的分类,除了unknow之外有7
  441.  
  442. 进行重编码
  443. levels(bank.dirty$job) <- c( "management","services","entrepreneur","entrepreneur",
  444. "management","unemployed", "entrepreneur","services",
  445. "unemployed","services","unemployed","unknown" )
  446. > levels(bank.dirty$month) <- c("Q2","Q3","Q4","Q3","Q2",
  447. "Q1","Q2","Q4","Q4","Q3")
  448. >
  449. > levels(bank.dirty$education) <- c( "primary","primary","primary","secondary",
  450. "primary","tertiary","tertiary","unknown")
  451.  
  452. 缺失值
  453. 分类较多,分类是unknown,不能给我们提供信息
  454. 有些模型不能处理缺失值,比如Logistic回归
  455. 缺失值插补的方法
  456. 1 用中位数或众数插补
  457. > library(imputeMissings)
  458. > bank.clean<-impute(bank.dirty,object = compute(bank.dirty,method = "median/mode"))
  459. 2 最邻近(knn)插补
  460. library(DMwR)
  461. bank.clean=knnImputation(bank.dirty,k=5)
  462.  
  463. 3 随机森林插补
  464. library(missForest)
  465. Imp = missForest(bank.dirty)
  466. bank.clean = Imp$ximp
  467.  
  468. 缺失值插补的R
  469. 1 imputeMissings
  470. 2 DMwR
  471.  
  472. Logistic回归建立客户响应模型
  473. 1 广义线性模型
  474. 广义线性模型擅长于处理因变量不是连续变量的问题
  475. 1) Y是分类变量
  476. 2) Y是定序变量
  477. 3) Y是离散取值
  478. 2 Y取值是0-1二分类变量是,就是Logistic回归
  479.  
  480. Logistic回归在R中的实现
  481. 数据重编码
  482. bank$y=ifelse(bank$y=='yes',1,0)
  483. 改成以Q1为参考因子
  484. bank$month<-relevel(bank$month,ref="Q1")
  485. 构建Logistic回归模型
  486. > model<-glm(y~.,data=bank,family = 'binomial')
  487. > summary(model)
  488.  
  489. Call:
  490. glm(formula = y ~ ., family = "binomial", data = bank)
  491.  
  492. Deviance Residuals:
  493. Min 1Q Median 3Q Max
  494. -5.9958 -0.3082 -0.1887 -0.1333 3.4283
  495.  
  496. Coefficients: (1 not defined because of singularities)
  497. Estimate Std. Error z value Pr(>|z|)
  498. (Intercept) -1.957e+02 1.935e+01 -10.116 < 2e-16 ***
  499. age 1.851e-03 2.415e-03 0.767 0.443289
  500. jobblue-collar -2.659e-01 7.942e-02 -3.348 0.000814 ***
  501. jobentrepreneur -2.029e-01 1.248e-01 -1.626 0.103924
  502. jobhousemaid -3.628e-02 1.475e-01 -0.246 0.805705
  503. jobmanagement -8.054e-02 8.501e-02 -0.947 0.343423
  504. jobretired 2.928e-01 1.067e-01 2.743 0.006092 **
  505. jobself-employed -1.680e-01 1.176e-01 -1.428 0.153332
  506. jobservices -1.497e-01 8.552e-02 -1.751 0.079969 .
  507. jobstudent 2.674e-01 1.106e-01 2.416 0.015680 *
  508. jobtechnician 3.462e-03 7.096e-02 0.049 0.961086
  509. jobunemployed 8.514e-03 1.273e-01 0.067 0.946686
  510. jobunknown -8.046e-02 2.390e-01 -0.337 0.736420
  511. maritalmarried 1.567e-02 6.824e-02 0.230 0.818420
  512. maritalsingle 6.620e-02 7.791e-02 0.850 0.395473
  513. maritalunknown 6.303e-02 4.113e-01 0.153 0.878211
  514. educationbasic.6y 9.647e-02 1.202e-01 0.803 0.422195
  515. educationbasic.9y -2.154e-02 9.494e-02 -0.227 0.820557
  516. educationhigh.school 3.381e-02 9.188e-02 0.368 0.712895
  517. educationilliterate 1.132e+00 7.395e-01 1.531 0.125887
  518. educationprofessional.course 1.136e-01 1.013e-01 1.121 0.262175
  519. educationuniversity.degree 2.134e-01 9.188e-02 2.322 0.020211 *
  520. educationunknown 1.361e-01 1.196e-01 1.138 0.255314
  521. defaultunknown -3.055e-01 6.712e-02 -4.552 5.32e-06 ***
  522. defaultyes -7.150e+00 1.135e+02 -0.063 0.949784
  523. housingunknown -7.385e-02 1.390e-01 -0.531 0.595260
  524. housingyes -3.740e-03 4.121e-02 -0.091 0.927695
  525. loanunknown NA NA NA NA
  526. loanyes -6.362e-02 5.725e-02 -1.111 0.266454
  527. contacttelephone -6.068e-01 7.124e-02 -8.518 < 2e-16 ***
  528. monthQ2 -2.192e+00 1.125e-01 -19.479 < 2e-16 ***
  529. monthQ3 -1.463e+00 1.148e-01 -12.747 < 2e-16 ***
  530. monthQ4 -1.995e+00 1.240e-01 -16.088 < 2e-16 ***
  531. day_of_weekmon -1.216e-01 6.588e-02 -1.846 0.064887 .
  532. day_of_weekthu 6.375e-02 6.382e-02 0.999 0.317842
  533. day_of_weektue 6.867e-02 6.545e-02 1.049 0.294118
  534. day_of_weekwed 1.436e-01 6.530e-02 2.199 0.027911 *
  535. duration 4.667e-03 7.397e-05 63.092 < 2e-16 ***
  536. campaign -4.543e-02 1.158e-02 -3.922 8.77e-05 ***
  537. pdays -9.627e-04 2.162e-04 -4.452 8.50e-06 ***
  538. previous -5.806e-02 5.879e-02 -0.988 0.323369
  539. poutcomenonexistent 4.507e-01 9.372e-02 4.809 1.51e-06 ***
  540. poutcomesuccess 9.371e-01 2.106e-01 4.451 8.56e-06 ***
  541. emp.var.rate -1.389e+00 7.693e-02 -18.057 < 2e-16 ***
  542. cons.price.idx 1.815e+00 1.193e-01 15.218 < 2e-16 ***
  543. cons.conf.idx 3.353e-02 6.664e-03 5.033 4.84e-07 ***
  544. euribor3m 6.054e-02 1.126e-01 0.537 0.590987
  545. nr.employed 4.937e-03 1.873e-03 2.635 0.008413 **
  546. ---
  547. Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 1
  548.  
  549. (Dispersion parameter for binomial family taken to be 1)
  550.  
  551. Null deviance: 28999 on 41187 degrees of freedom
  552. Residual deviance: 17199 on 41141 degrees of freedom
  553. AIC: 17293
  554.  
  555. Number of Fisher Scoring iterations: 10
  556.  
  557. > exp(coef(model))
  558. (Intercept) age jobblue-collar
  559. 9.856544e-86 1.001853e+00 7.665077e-01
  560. jobentrepreneur jobhousemaid jobmanagement
  561. 8.163314e-01 9.643733e-01 9.226187e-01
  562. jobretired jobself-employed jobservices
  563. 1.340142e+00 8.453874e-01 8.609387e-01
  564. jobstudent jobtechnician jobunemployed
  565. 1.306514e+00 1.003468e+00 1.008550e+00
  566. jobunknown maritalmarried maritalsingle
  567. 9.226922e-01 1.015789e+00 1.068445e+00
  568. maritalunknown educationbasic.6y educationbasic.9y
  569. 1.065061e+00 1.101276e+00 9.786948e-01
  570. educationhigh.school educationilliterate educationprofessional.course
  571. 1.034388e+00 3.101297e+00 1.120248e+00
  572. educationuniversity.degree educationunknown defaultunknown
  573. 1.237856e+00 1.145744e+00 7.367445e-01
  574. defaultyes housingunknown housingyes
  575. 7.851906e-04 9.288126e-01 9.962671e-01
  576. loanunknown loanyes contacttelephone
  577. NA 9.383587e-01 5.450980e-01
  578. monthQ2 monthQ3 monthQ4
  579. 1.116739e-01 2.314802e-01 1.360620e-01
  580. day_of_weekmon day_of_weekthu day_of_weektue
  581. 8.854888e-01 1.065828e+00 1.071082e+00
  582. day_of_weekwed duration campaign
  583. 1.154380e+00 1.004678e+00 9.555850e-01
  584. pdays previous poutcomenonexistent
  585. 9.990378e-01 9.435960e-01 1.569466e+00
  586. poutcomesuccess emp.var.rate cons.price.idx
  587. 2.552531e+00 2.493091e-01 6.140533e+00
  588. cons.conf.idx euribor3m nr.employed
  589. 1.034103e+00 1.062408e+00 1.004949e+00
  590.  
  591. Job变量的基准水平是management,从上面的结果看,服务业和自主劳动者购买银行产品的几率(odds)是管理岗从业人员的0.88倍,未就业人员购买银行产品的几率是管理岗人员的1.25
  592.  
  593. > summary(model.step)
  594. 向前逐步回归
  595. > model.step=step(model,direction = "backward")
  596. 向后逐步回归
  597. > model.step = step(model, direction = "forward")
  598. 双向逐步回归
  599. > model.step = step(model, direction = "both")
  600. > summary(model.step)
  601.  
  602. Call:
  603. glm(formula = y ~ job + education + default + contact + month +
  604. day_of_week + duration + campaign + pdays + poutcome + emp.var.rate +
  605. cons.price.idx + cons.conf.idx + nr.employed, family = "binomial",
  606. data = bank)
  607.  
  608. Deviance Residuals:
  609. Min 1Q Median 3Q Max
  610. -5.9884 -0.3088 -0.1887 -0.1332 3.4026
  611.  
  612. Coefficients:
  613. Estimate Std. Error z value Pr(>|z|)
  614. (Intercept) -2.031e+02 1.426e+01 -14.246 < 2e-16 ***
  615. jobblue-collar -2.700e-01 7.917e-02 -3.411 0.000648 ***
  616. jobentrepreneur -2.043e-01 1.242e-01 -1.645 0.100003
  617. jobhousemaid -2.832e-02 1.464e-01 -0.193 0.846590
  618. jobmanagement -8.368e-02 8.409e-02 -0.995 0.319670
  619. jobretired 3.234e-01 9.130e-02 3.542 0.000397 ***
  620. jobself-employed -1.670e-01 1.176e-01 -1.421 0.155435
  621. jobservices -1.528e-01 8.545e-02 -1.789 0.073666 .
  622. jobstudent 2.682e-01 1.046e-01 2.565 0.010316 *
  623. jobtechnician 4.389e-03 7.093e-02 0.062 0.950665
  624. jobunemployed 8.975e-03 1.271e-01 0.071 0.943715
  625. jobunknown -6.363e-02 2.378e-01 -0.268 0.789057
  626. educationbasic.6y 8.993e-02 1.196e-01 0.752 0.452024
  627. educationbasic.9y -2.716e-02 9.416e-02 -0.288 0.772992
  628. educationhigh.school 2.890e-02 9.053e-02 0.319 0.749573
  629. educationilliterate 1.118e+00 7.398e-01 1.511 0.130744
  630. educationprofessional.course 1.084e-01 1.004e-01 1.079 0.280686
  631. educationuniversity.degree 2.103e-01 9.017e-02 2.332 0.019678 *
  632. educationunknown 1.363e-01 1.195e-01 1.140 0.254110
  633. defaultunknown -3.017e-01 6.666e-02 -4.526 6.02e-06 ***
  634. defaultyes -7.141e+00 1.135e+02 -0.063 0.949831
  635. contacttelephone -6.011e-01 7.069e-02 -8.504 < 2e-16 ***
  636. monthQ2 -2.210e+00 1.108e-01 -19.939 < 2e-16 ***
  637. monthQ3 -1.475e+00 1.146e-01 -12.869 < 2e-16 ***
  638. monthQ4 -1.982e+00 1.183e-01 -16.755 < 2e-16 ***
  639. day_of_weekmon -1.210e-01 6.584e-02 -1.837 0.066174 .
  640. day_of_weekthu 6.208e-02 6.374e-02 0.974 0.330066
  641. day_of_weektue 6.851e-02 6.538e-02 1.048 0.294651
  642. day_of_weekwed 1.420e-01 6.525e-02 2.176 0.029592 *
  643. duration 4.667e-03 7.396e-05 63.099 < 2e-16 ***
  644. campaign -4.587e-02 1.158e-02 -3.960 7.49e-05 ***
  645. pdays -8.822e-04 2.024e-04 -4.358 1.31e-05 ***
  646. poutcomenonexistent 5.219e-01 6.356e-02 8.211 < 2e-16 ***
  647. poutcomesuccess 9.996e-01 2.028e-01 4.928 8.31e-07 ***
  648. emp.var.rate -1.376e+00 6.885e-02 -19.980 < 2e-16 ***
  649. cons.price.idx 1.845e+00 1.041e-01 17.725 < 2e-16 ***
  650. cons.conf.idx 3.622e-02 4.853e-03 7.464 8.42e-14 ***
  651. nr.employed 5.883e-03 9.765e-04 6.024 1.70e-09 ***
  652. ---
  653. Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 1
  654.  
  655. (Dispersion parameter for binomial family taken to be 1)
  656.  
  657. Null deviance: 28999 on 41187 degrees of freedom
  658. Residual deviance: 17203 on 41150 degrees of freedom
  659. AIC: 17279
  660.  
  661. Number of Fisher Scoring iterations: 10
  662.  
  663. 模型预测
  664. predict函数,参数type=’response
  665. Newdata参数是要预测的数据集
  666.  
  667. > prob<-predict(model.step,type = 'response')
  668. > head(prob)
  669. 1 2 3 4 5 6
  670. 0.015029328 0.006044212 0.011640349 0.010173952 0.016897254 0.007174804
  671.  
  672. 假设以0.5为临界值
  673. > pre<-ifelse(prob>0.5,1,0)
  674. > table(pre,bank$y)
  675.  
  676. pre 0 1
  677. 0 35596 2667
  678. 1 952 1973
  679.  
  680. >
  681.  
  682. 预测的准确率
  683. > (35592+1964)/(35592+2676+956+1964)
  684. [1] 0.911819
  685.  
  686. 实际有响应的客户被识别出了多少
  687. > 1964/(1964+2676)
  688. [1] 0.4232759
  689.  
  690. 模型评估
  691.  
  692. > confusionMatrix(bank$y,pre,pos='')
  693. Confusion Matrix and Statistics
  694.  
  695. Reference
  696. Prediction 0 1
  697. 0 35596 952
  698. 1 2667 1973
  699.  
  700. Accuracy : 0.9121
  701. 95% CI : (0.9094, 0.9149)
  702. No Information Rate : 0.929
  703. P-Value [Acc > NIR] : 1
  704.  
  705. Kappa : 0.476
  706. Mcnemar's Test P-Value : <2e-16
  707.  
  708. Sensitivity : 0.67453
  709. Specificity : 0.93030
  710. Pos Pred Value : 0.42522
  711. Neg Pred Value : 0.97395
  712. Prevalence : 0.07102
  713. Detection Rate : 0.04790
  714. Detection Prevalence : 0.11265
  715. Balanced Accuracy : 0.80241
  716.  
  717. 'Positive' Class : 1
  718.  
  719. Kappa 统计量(kappa statistic)
  720. 用于评判分类器的分类结果与随机分类的差异度
  721. 用Kappa统计量评价:
  722. 较差:小于0.20
  723. 一般:0.20至0.40
  724. 稳健:0.40至0.60
  725. 好的:0.60至0.80
  726. 很好的:0.80至1.00
  727.  
  728. ROC曲线
  729. pred<-prediction(prob,bank$y)
  730. perf<-performance(pred,measure = "tpr",x="fpr")
  731. plot(perf)
  732.  
  733. RandomForest
  734. 加载数据列
  735.  
  736. > data=read.table("input.txt",header = TRUE)
  737. > str(data)
  738. 'data.frame': 222 obs. of 23 variables:
  739. $ Acti_Profile : num 0 0 0 0 0 0 0 0 0 0 ...
  740. $ Activity : num 1.25 0 0.938 6.562 0 ...
  741. $ Diastolic_PTT : num 256 240 253 0 241 ...
  742. $ Diastolic : num 73.2 78.6 74 0 78.4 ...
  743. $ Heart_Rate_Curve : num 81.2 69.7 77.6 95 83.6 ...
  744. $ Heart_Rate_Variability_HF: num 131 250 135 144 141 ...
  745. $ Heart_Rate_Variability_LF: num 311 218 203 301 244 ...
  746. $ MAP : num 86 93.5 86.9 0 91.7 ...
  747. $ Position : num 0 0 0 1 0 0 0 0 0 0 ...
  748. $ PTT_Raw : num 308 288 308 0 295 ...
  749. $ RR_Interval : num 734 878 773 632 714 ...
  750. $ Sleep_Wake : num 1 1 1 1 1 0 1 1 0 0 ...
  751. $ SpO2 : num 0 0 99 0 98.4 ...
  752. $ Sympatho_Vagal_Balance : num 23 8.17 14.5 20.4 16.88 ...
  753. $ Systolic_PTT : num 308 288 307 0 295 ...
  754. $ Systolic : num 113 124 113 0 119 ...
  755. $ Autonomic_arousals : num 0 0 0 0 0 0 0 0 0 0 ...
  756. $ Cardio_complex : num 0 0 0 1 0 0 0 0 0 0 ...
  757. $ Cardio_rhythm : num 0 0 2 0 0 0 0 0 0 0 ...
  758. $ Classification_Arousal : num 0 0 0 0 0 0 0 0 0 0 ...
  759. $ PTT_Events : num 1 0 2 0 0 0 0 0 0 0 ...
  760. $ Systolic_Events : num 1 0 1 0 0 0 0 0 0 0 ...
  761. $ y : num 1 0 1 0 0 0 0 0 0 0 ...
  762. 加载随机森林包
  763. > library(randomForest)
  764. 进行训练 以y作为因变量,其余数据作为自变量
  765. > rf <- randomForest(y ~ ., data=data, ntree=100, proximity=TRUE,importance=TRUE)
  766. > plot(rf)
  767.  
  768. 重要性检测
  769. 衡量把一个变量的取值变为随机数,随机森林预测准确性的降低程度
  770. > importance(rf,type=1)
  771. %IncMSE
  772. Acti_Profile 0.00000000
  773. Activity 0.99353251
  774. Diastolic_PTT 0.32193611
  775. Diastolic 1.99891809
  776. Heart_Rate_Curve 0.92001352
  777. Heart_Rate_Variability_HF 2.07870722
  778. Heart_Rate_Variability_LF -0.24957163
  779. MAP 0.48142975
  780. Position 1.86876751
  781. PTT_Raw 1.94648914
  782. RR_Interval 0.60557964
  783. Sleep_Wake 1.00503782
  784. SpO2 0.25396165
  785. Sympatho_Vagal_Balance 1.42906765
  786. Systolic_PTT 1.27965813
  787. Systolic 0.77382673
  788. Autonomic_arousals 0.00000000
  789. Cardio_complex 1.00503782
  790. Cardio_rhythm 1.14283152
  791. Classification_Arousal -0.04383997
  792. PTT_Events 4.63980680
  793. Systolic_Events 33.29461169
  794.  
  795. 输出随机森林的模型
  796. > print(rf)
  797.  
  798. Call:
  799. randomForest(formula = y ~ ., data = data, ntree = 100, proximity = TRUE, importance = TRUE)
  800. Type of random forest: regression
  801. Number of trees: 100
  802. No. of variables tried at each split: 7
  803.  
  804. Mean of squared residuals: 0.003226897 残差平方和SSE
  805. % Var explained: 98.7
  806.  
  807. >
  808. 总平方和(SST):(样本数据-样本均值)的平方和
  809. 回归平方和(SSR):(预测数据-样本均值)的平方和
  810. 残差平方和(SSE):(样本数据-预测数据均值)的平方和
  811.  
  812. SST = SSR + SSE
  813.  
  814. 基尼指数:
  815.  
  816. > importance(rf,type=2)
  817. IncNodePurity
  818. Acti_Profile 0.000000000
  819. Activity 0.445181480
  820. Diastolic_PTT 0.452221870
  821. Diastolic 0.449372186
  822. Heart_Rate_Curve 0.473113852
  823. Heart_Rate_Variability_HF 0.226815300
  824. Heart_Rate_Variability_LF 0.205457353
  825. MAP 0.536977574
  826. Position 0.307333210
  827. PTT_Raw 0.656726800
  828. RR_Interval 0.452738011
  829. Sleep_Wake 0.014423077
  830. SpO2 1.793361279
  831. Sympatho_Vagal_Balance 0.352759689
  832. Systolic_PTT 0.851951505
  833. Systolic 0.823955781
  834. Autonomic_arousals 0.000000000
  835. Cardio_complex 0.008047619
  836. Cardio_rhythm 0.141907084
  837. Classification_Arousal 0.085739429
  838. PTT_Events 7.468690820
  839. Systolic_Events 39.000163018
  840.  
  841. >
  842. 进行预测
  843. prediction <- predict(rf, data[,],type="response")
  844. 输出预测结果
  845. table(observed =data$y,predicted=prediction)
  846. plot(prediction)
  847.  
  848. 支持向量机
  849. library(e1071)
  850. svmfit<-svm(y~.,data=data,kernel="linear",cost=10,scale=FALSE)
  851. > print(svmfit)
  852.  
  853. Call:
  854. svm(formula = y ~ ., data = data, kernel = "linear", cost = 10, scale = FALSE)
  855.  
  856. Parameters:
  857. SVM-Type: eps-regression
  858. SVM-Kernel: linear
  859. cost: 10
  860. gamma: 0.04545455
  861. epsilon: 0.1
  862.  
  863. Number of Support Vectors: 20
  864. > plot(svmfit,data)
  865.  
  866. 神经网络
  867.  
  868. > concrete<-read_excel("Concrete_Data.xls")
  869. > str(concrete)
  870. Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1030 obs. of 9 variables:
  871. $ Cement : num 540 540 332 332 199 ...
  872. $ Slag : num 0 0 142 142 132 ...
  873. $ Ash : num 0 0 0 0 0 0 0 0 0 0 ...
  874. $ water : num 162 162 228 228 192 228 228 228 228 228 ...
  875. $ superplastic: num 2.5 2.5 0 0 0 0 0 0 0 0 ...
  876. $ coarseagg : num 1040 1055 932 932 978 ...
  877. $ fineagg : num 676 676 594 594 826 ...
  878. $ age : num 28 28 270 365 360 90 365 28 28 28 ...
  879. $ strength : num 80 61.9 40.3 41.1 44.3 ...
  880.  
  881. > normalize <- function(x){ return ((x-min(x))/(max(x)-min(x)))}
  882. > concrete_norm <- as.data.frame(lapply(concrete,normalize))
  883.  
  884. > concrete_train <- concrete_norm[1:773,]
  885. > concrete_test <- concrete_norm[774:1030,]
  886.  
  887. > library(neuralnet)
  888. > concrete_model <- neuralnet(strength ~ Cement+Slag+Ash+water+superplastic+coarseagg+fineagg+age,data=concrete_train)
  889. > plot(concrete_model)
  890.  
  891. model_results <- compute(concrete_model,concrete_test[1:8])
  892. predicted_strength <- model_results$net.result
  893. > cor(predicted_strength,concrete_test$strength)
  894. [,1]
  895. [1,] 0.7205120076
  896. > concrete_model2 <- neuralnet(strength ~ Cement+Slag+Ash+water+superplastic+coarseagg+fineagg+age,data=concrete_train,hidden=5)
  897. > plot(concrete_model2)
  898.  
  899. 计算误差
  900. > model_results2 <- compute(concrete_model2,concrete_test[1:8])
  901. > predicted_strength2 <- model_results2$net.result
  902. > cor(predicted_strength2,concrete_test$strength)
  903. [,1]
  904. [1,] 0.6727155609
  905.  
  906. >
  907.  
  908. 主成分分析
  909. 身高、体重、胸围、坐高
  910. > test<-data.frame(
  911. + X1=c(148, 139, 160, 149, 159, 142, 153, 150, 151, 139,
  912. + 140, 161, 158, 140, 137, 152, 149, 145, 160, 156,
  913. + 151, 147, 157, 147, 157, 151, 144, 141, 139, 148),
  914. + X2=c(41, 34, 49, 36, 45, 31, 43, 43, 42, 31,
  915. + 29, 47, 49, 33, 31, 35, 47, 35, 47, 44,
  916. + 42, 38, 39, 30, 48, 36, 36, 30, 32, 38),
  917. + X3=c(72, 71, 77, 67, 80, 66, 76, 77, 77, 68,
  918. + 64, 78, 78, 67, 66, 73, 82, 70, 74, 78,
  919. + 73, 73, 68, 65, 80, 74, 68, 67, 68, 70),
  920. + X4=c(78, 76, 86, 79, 86, 76, 83, 79, 80, 74,
  921. + 74, 84, 83, 77, 73, 79, 79, 77, 87, 85,
  922. + 82, 78, 80, 75, 88, 80, 76, 76, 73, 78)
  923. + )
  924. > test.pr<-princomp(test,cor=TRUE)
  925. > summary(test.pr,loadings=TRUE)
  926. Importance of components:
  927. Comp.1 Comp.2 Comp.3 Comp.4
  928. Standard deviation 1.8817805390 0.55980635717 0.28179594325 0.25711843909
  929. Proportion of Variance 0.8852744993 0.07834578938 0.01985223841 0.01652747293
  930. Cumulative Proportion 0.8852744993 0.96362028866 0.98347252707 1.00000000000
  931.  
  932. Loadings:
  933. Comp.1 Comp.2 Comp.3 Comp.4
  934. X1 0.497 0.543 -0.450 0.506
  935. X2 0.515 -0.210 -0.462 -0.691
  936. X3 0.481 -0.725 0.175 0.461
  937. X4 0.507 0.368 0.744 -0.232
  938.  
  939. 前两个主成分的累计贡献率已经达到96% 可以舍去另外两个主成分 达到降维的目的
  940. 因此可以得到函数表达式 Z1=-0.497X'1-0.515X'2-0.481X'3-0.507X'4
  941. Z2= 0.543X'1-0.210X'2-0.725X'3-0.368X'4
  942. 4.画主成分的碎石图并预测
  943.  
  944. > screeplot(test.pr,type="lines")
  945. > p<-predict(test.pr)
  946. > p
  947. Comp.1 Comp.2 Comp.3 Comp.4
  948. [1,] -0.06990949737 -0.23813701272 -0.35509247634 -0.266120139417
  949. [2,] -1.59526339772 -0.71847399061 0.32813232022 -0.118056645885
  950. [3,] 2.84793151061 0.38956678680 -0.09731731272 -0.279482487139
  951. [4,] -0.75996988424 0.80604334819 -0.04945721875 -0.162949297761
  952. [5,] 2.73966776853 0.01718087263 0.36012614873 0.358653043787
  953. [6,] -2.10583167924 0.32284393414 0.18600422367 -0.036456083707
  954. [7,] 1.42105591247 -0.06053164925 0.21093320662 -0.044223092351
  955. [8,] 0.82583976981 -0.78102575640 -0.27557797533 0.057288571933
  956. [9,] 0.93464401954 -0.58469241699 -0.08814135786 0.181037745585
  957. [10,] -2.36463819933 -0.36532199291 0.08840476284 0.045520127461
  958. [11,] -2.83741916086 0.34875841111 0.03310422938 -0.031146930047
  959. [12,] 2.60851223537 0.21278727930 -0.33398036623 0.210157574387
  960. [13,] 2.44253342081 -0.16769495893 -0.46918095412 -0.162987829937
  961. [14,] -1.86630668724 0.05021383642 0.37720280364 -0.358821916178
  962. [15,] -2.81347420580 -0.31790107093 -0.03291329149 -0.222035112399
  963. [16,] -0.06392982655 0.20718447599 0.04334339948 0.703533623798
  964. [17,] 1.55561022242 -1.70439673831 -0.33126406220 0.007551878960
  965. [18,] -1.07392250663 -0.06763418320 0.02283648409 0.048606680158
  966. [19,] 2.52174211878 0.97274300950 0.12164633439 -0.390667990681
  967. [20,] 2.14072377494 0.02217881219 0.37410972458 0.129548959692
  968. [21,] 0.79624421805 0.16307887263 0.12781269571 -0.294140762463
  969. [22,] -0.28708320594 -0.35744666106 -0.03962115883 0.080991988802
  970. [23,] 0.25151075072 1.25555187663 -0.55617324819 0.109068938725
  971. [24,] -2.05706031616 0.78894493512 -0.26552109297 0.388088642937
  972. [25,] 3.08596854773 -0.05775318018 0.62110421208 -0.218939612456
  973. [26,] 0.16367554630 0.04317931667 0.24481850312 0.560248997030
  974. [27,] -1.37265052598 0.02220972121 -0.23378320040 -0.257399715466
  975. [28,] -2.16097778154 0.13733232981 0.35589738735 0.093123683044
  976. [29,] -2.40434826507 -0.48613137190 -0.16154440788 -0.007914021222
  977. [30,] -0.50287467640 0.14734316507 -0.20590831261 -0.122078819188
  978.  
  979. >

加载数据

>
w<-read.table("test.prn",header = T)

> w

X.. X...1

1   A    
2

2   B    
3

3   C    
5

4   D    
5

> library(readxl)

>
dat<-read_excel("test.xlsx")

> dat

# A tibble: 4 x 2

`商品` `价格`

<chr> 
<dbl>

1      A     
2

2      B     
3

3      C     
5

4      D     
5

>
bank=read.table("bank-full.csv",header = TRUE,sep=",")

查看数据结构

> str(bank)

'data.frame':  41188 obs. of 
21 variables:

$ age          
: int  56 57 37 40 56 45 59 41 24
25 ...

$ job          
: Factor w/ 12 levels "admin.","blue-collar",..: 4 8
8 1 8 8 1 2 10 8 ...

$ marital      
: Factor w/ 4 levels "divorced","married",..: 2 2 2
2 2 2 2 2 3 3 ...

$ education    
: Factor w/ 8 levels "basic.4y","basic.6y",..: 1 4 4
2 4 3 6 8 6 4 ...

$ default      
: Factor w/ 3 levels "no","unknown",..: 1 2 1 1 1 2
1 2 1 1 ...

$ housing      
: Factor w/ 3 levels "no","unknown",..: 1 1 3 1 1 1
1 1 3 3 ...

$ loan         
: Factor w/ 3 levels "no","unknown",..: 1 1 1 1 3 1
1 1 1 1 ...

$ contact      
: Factor w/ 2 levels "cellular","telephone": 2 2 2 2
2 2 2 2 2 2 ...

$ month        
: Factor w/ 10 levels
"apr","aug","dec",..: 7 7 7 7 7 7 7 7 7 7 ...

$ day_of_week  
: Factor w/ 5 levels "fri","mon","thu",..:
2 2 2 2 2 2 2 2 2 2 ...

$ duration     
: int  261 149 226 151 307 198 139
217 380 50 ...

$ campaign     
: int  1 1 1 1 1 1 1 1 1 1 ...

$ pdays        
: int  999 999 999 999 999 999 999
999 999 999 ...

$ previous     
: int  0 0 0 0 0 0 0 0 0 0 ...

$ poutcome     
: Factor w/ 3 levels "failure","nonexistent",..: 2 2
2 2 2 2 2 2 2 2 ...

$ emp.var.rate 
: num  1.1 1.1 1.1 1.1 1.1 1.1 1.1
1.1 1.1 1.1 ...

$ cons.price.idx: num  94 94 94 94 94 ...

$ cons.conf.idx : num  -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4
-36.4 -36.4 -36.4 ...

$ euribor3m    
: num  4.86 4.86 4.86 4.86 4.86
...

$ nr.employed  
: num  5191 5191 5191 5191 5191
...

$ y            
: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1
...

查看数据的最小值,最大值,中位数,平均数,分位数

> summary(bank)

age                 job            marital

Min.  
:17.00   admin.     :10422  
divorced: 4612

1st Qu.:32.00  
blue-collar: 9254   married
:24928

Median :38.00  
technician : 6743   single  :11568

Mean  
:40.02   services   : 3969  
unknown :   80

3rd Qu.:47.00  
management : 2924

Max.  
:98.00   retired    : 1720

(Other)    : 6156

education        default         housing

university.degree  :12168  
no     :32588   no    
:18622

high.school        : 9515  
unknown: 8597   unknown:  990

basic.9y           : 6045   yes   
:    3   yes   
:21576

professional.course: 5243

basic.4y           : 4176

basic.6y           : 2292

(Other)            : 1749

loan            contact          month       day_of_week

no    
:33950   cellular :26144   may   
:13769   fri:7827

unknown: 
990   telephone:15044   jul   
: 7174   mon:8514

yes    :
6248                     aug    : 6178  
thu:8623

jun    : 5318  
tue:8090

nov    : 4101  
wed:8134

apr    : 2632

(Other):
2016

duration         campaign          pdays

Min.  
:   0.0   Min.  
: 1.000   Min.   : 
0.0

1st Qu.: 102.0   1st Qu.: 1.000   1st Qu.:999.0

Median : 180.0   Median : 2.000   Median :999.0

Mean   :
258.3   Mean   : 2.568  
Mean   :962.5

3rd Qu.: 319.0   3rd Qu.: 3.000   3rd Qu.:999.0

Max.  
:4918.0   Max.   :56.000  
Max.   :999.0

previous            poutcome      emp.var.rate

Min.  
:0.000   failure    : 4252  
Min.   :-3.40000

1st Qu.:0.000  
nonexistent:35563   1st
Qu.:-1.80000

Median :0.000  
success    : 1373   Median : 1.10000

Mean  
:0.173                       Mean   : 0.08189

3rd Qu.:0.000                       3rd Qu.: 1.40000

Max.  
:7.000                      
Max.   : 1.40000

cons.price.idx 
cons.conf.idx     euribor3m

Min.  
:92.20   Min.   :-50.8  
Min.   :0.634

1st Qu.:93.08  
1st Qu.:-42.7   1st Qu.:1.344

Median :93.75  
Median :-41.8   Median :4.857

Mean  
:93.58   Mean   :-40.5  
Mean   :3.621

3rd Qu.:93.99  
3rd Qu.:-36.4   3rd Qu.:4.961

Max.  
:94.77   Max.   :-26.9  
Max.   :5.045

nr.employed     y

Min.  
:4964   no :36548

1st Qu.:5099  
yes: 4640

Median :5191

Mean  
:5167

3rd Qu.:5228

Max.  
:5228

> psych::describe(bank)

方差  个数    平均值  标准差  均值    去掉最大   中位数   最小值  最大值  极差    偏差        峰度

绝对偏差

最小值

之后

的平均数

vars     n   
mean     sd  median trimmed   mad     min    
max   range  skew    kurtosis

age               1 41188   40.02 
10.42   38.00   39.30 
10.38   17.00   98.00  
81.00  0.78     0.79

job*              2 41188    4.72  
3.59    3.00    4.48  
2.97    1.00   12.00  
11.00  0.45    -1.39

marital*          3 41188    2.17  
0.61    2.00    2.21  
0.00    1.00    4.00   
3.00 -0.06    -0.34

education*        4 41188    4.75  
2.14    4.00    4.88  
2.97    1.00    8.00   
7.00 -0.24    -1.21

default*          5 41188    1.21  
0.41    1.00    1.14   0.00   
1.00    3.00    2.00 
1.44     0.07

housing*          6 41188    2.07  
0.99    3.00    2.09  
0.00    1.00    3.00   
2.00 -0.14    -1.95

loan*             7 41188    1.33  
0.72    1.00    1.16  
0.00    1.00    3.00   
2.00  1.82     1.38

contact*          8 41188    1.37  
0.48    1.00    1.33  
0.00    1.00    2.00   
1.00  0.56    -1.69

month*            9 41188    5.23  
2.32    5.00    5.31  
2.97    1.00   10.00   
9.00 -0.31    -1.03

day_of_week*     10 41188   
3.00   1.40    3.00   
3.01   1.48    1.00   
5.00    4.00  0.01   
-1.27

duration         11 41188  258.29 259.28 
180.00  210.61 139.36    0.00 4918.00 4918.00  3.26   
20.24

campaign         12 41188    2.57  
2.77    2.00    1.99  
1.48    1.00   56.00  
55.00  4.76    36.97

pdays            13 41188  962.48 186.91 
999.00  999.00   0.00   
0.00  999.00  999.00 -4.92    22.23

previous         14 41188    0.17  
0.49    0.00    0.05  
0.00    0.00    7.00   
7.00  3.83    20.11

poutcome*        15 41188    1.93  
0.36    2.00    2.00  
0.00    1.00    3.00   
2.00 -0.88     3.98

emp.var.rate     16 41188   
0.08   1.57    1.10   
0.27   0.44   -3.40   
1.40    4.80 -0.72    -1.06

cons.price.idx   17 41188  
93.58   0.58   93.75  
93.58   0.56   92.20  
94.77    2.57 -0.23    -0.83

cons.conf.idx    18 41188 
-40.50   4.63  -41.80 
-40.60   6.52  -50.80 
-26.90   23.90  0.30   
-0.36

euribor3m        19 41188    3.62  
1.73    4.86    3.81  
0.16    0.63    5.04   
4.41 -0.71    -1.41

nr.employed      20 41188 5167.04  72.25 5191.00 5178.43  55.00 4963.60 5228.10  264.50 -1.04     0.00

y*               21 41188    1.11  
0.32    1.00    1.02  
0.00    1.00    2.00   
1.00  2.45     4.00

se

age            0.05

job*           0.02

marital*       0.00

education*     0.01

default*       0.00

housing*       0.00

loan*          0.00

contact*       0.00

month*         0.01

day_of_week*   0.01

duration       1.28

campaign       0.01

pdays          0.92

previous       0.00

poutcome*      0.00

emp.var.rate   0.01

cons.price.idx 0.00

cons.conf.idx  0.02

euribor3m      0.01

nr.employed    0.36

y*             0.00

查看数据是否有缺失值

> sapply(bank,anyNA)

age            job        marital      education

FALSE          FALSE          FALSE          FALSE

default        housing           loan        contact

FALSE          FALSE          FALSE          FALSE

month    day_of_week       duration       campaign

FALSE          FALSE          FALSE          FALSE

pdays       previous      
poutcome   emp.var.rate

FALSE          FALSE          FALSE          FALSE

cons.price.idx  cons.conf.idx      euribor3m    nr.employed

FALSE          FALSE          FALSE          FALSE

y

FALSE

成功与不成功的个数

> table(bank$y)

no  
yes

36548  4640

在是否结婚这个属性的取值与

是否成功的数量比较

> table(bank$y,bank$marital)

divorced married single unknown

no     
4136   22396   9948     
68

yes     
476    2532   1620     
12

> xtabs(~y+marital,data=bank)

marital

y     divorced married single unknown

no     
4136   22396   9948     
68

yes     
476    2532   1620     
12

>
tab=table(bank$y,bank$marital)

> tab

divorced married single unknown

no     
4136   22396   9948     
68

yes     
476    2532   1620     
12

在是否结婚这个属性上的取值

> margin.table(tab,2)

divorced  married  
single  unknown

4612   
24928    11568       80

> margin.table(tab,1)

no  
yes

36548  4640

在是否结婚这个属性上横向看概率

> prop.table(tab,1)

divorced     married      single    
unknown

no 
0.113166247 0.612783189 0.272189997 0.001860567

yes 0.102586207 0.545689655 0.349137931
0.002586207

在是否结婚这个属性上纵向看概率

> prop.table(tab,2)

divorced   married   
single   unknown

no 
0.8967910 0.8984275 0.8599585 0.8500000

yes 0.1032090 0.1015725 0.1400415 0.1500000

平的列联表

以第一列和第二列,展开分类group by 1,2

以col.vars 的取值进行次数统计

>
ftable(bank[,c(3,4,21)],row.vars = 1:2,col.vars = "y")

y   no 
yes

marital  education

divorced basic.4y               406   83

basic.6y               169   13

basic.9y               534   31

high.school           1086 
107

illiterate               1    1

professional.course    596  
61

university.degree     1177 
160

unknown                167   20

married  basic.4y              2915  313

basic.6y              1628  139

basic.9y              3858  298

high.school           4683 
475

illiterate              12    3

professional.course   2799 
357

university.degree     5573 
821

unknown                928  126

single   basic.4y               422   31

basic.6y               301   36

basic.9y              1174  142

high.school           2702 
448

illiterate               1    0

professional.course   1247 
177

university.degree     3723 
683

unknown                378  103

unknown  basic.4y                 5    1

basic.6y                 6    0

basic.9y                 6    2

high.school             13    1

illiterate               0    0

professional.course      6   
0

university.degree       25   
6

unknown                  7    2

卡方检验,在p值小于2.2e-16时,拒绝原假设,认为数据不服从卡方分布

> chisq.test(tab)

Pearson's Chi-squared test

data:  tab

X-squared = 122.66, df = 3,
p-value < 2.2e-16

画直方图

> hist(bank$age)

> library(lattice)

画连续变量的分布,就是把直方图的中位数连接起来

以年龄为横轴,y为纵轴,数据是bank,画图,auto.key是否有图例

> densityplot(~age,groups =
y,data=bank,plot.point=FALSE,auto.key = TRUE)

画Box图

> boxplot(age~y,data=bank)

双样本t分布检验,p值小于0.05时拒绝原假设

这里的原假设是两个样本没有相关性

得到的结果是p值为1.805e-06,拒绝两个样本没有相关性的假设

这里认为两个样本有相关性

>
t.test(age~y,data=bank,alternative="two.sided",var.equal=FALSE)

Welch Two Sample t-test

data:  age by y

t = -4.7795, df = 5258.5,
p-value = 1.805e-06

alternative hypothesis: true
difference in means is not equal to 0

95 percent confidence interval:

-1.4129336 -0.5909889

sample estimates:

mean in group no mean in group yes

39.91119          40.91315

数据可视化

画饼图

> tab=table(bank$marital)

> pie(tab)

画直方图

> tab=table(bank$marital)

> barplot(tab)

画下面这个图

> tab=table(bank$marital,bank$y)

> plot(tab)

画层叠直方图

>
tab=table(bank$marital,bank$y)

>
lattice::barchart(tab,auto.key=TRUE)

加载这个包,准备画图

> library(dplyr)

>
data=group_by(bank,marital,y)

> data=tally(data)

!!!!!!!!!!!!!

> ggplot2::ggplot(data=data,mapping=aes(marital,n))+geom_bar(mapping=aes(fill=y),position="dodge",stat="identity")
 
 
 
数据预处理
分组之后再画图

> labels=c('青年','中年','老年')

> bank$age_group=cut(bank$age,breaks = c(0,35,55,100),right = FALSE,labels = labels)

> library(ggplot2)

> ggplot(data=bank,mapping = aes(age_group))+geom_bar(mapping = aes(fill=y),position="dodge",stat="count")

衍生变量
直接使用$符向原数据框添加新的变量

> bank$log.cons.price.idx=log(bank$cons.price.idx)

使用transform函数向原数据框添加变量

> bank<-transform(bank,log.cons.price.idx=log(cons.price.idx),log.nr.employed=log(nr.employed))

使用dplyr包里的mutate函数增加变量

> bank<-dplyr::mutate(bank,log.cons.price.idx=log(cons.price.idx))

使用dplyr包里的transmute函数只保留新生成的变量

> bank2<-dplyr::transmute(bank,log.cons.price.idx=log(cons.price.idx),log.nr.employed=log(nr.employed))

中心化

> v=1:10

> v1=v-mean(v)

> v2=scale(v,center=TRUE,scale = FALSE)

无量纲化

> V1=v/sqrt(sum(v^2)/(length(v)-1))

> v2=scale(v,center=FALSE,scale=TRUE)

根据最大最小值进行归一化

> v3=(v-min(v))/(max(v)-min(v))

进行标准正态化

> v1=(v-mean(v))/sd(v)

> v2=scale(v,center = TRUE,scale=TRUE)

Box-Cox变换

使用car包里的boxCox函数

> install.packages("car")

> library(car)

> boxCox(age~.,data=bank)

使用caret包,做Box-Cox变换

> install.packages("caret")

> library(caret)

> dat<-subset(bank,select="age")

> trans<-preProcess(dat,method=C("BoxCox"))

数据预处理下

违反常识的异常值

基于数据分布的异常值(离群点)识别

bank.dirty=read.csv("bank-dirty.csv")
summary(bank.dirty)
     age                  job            marital                    education    
 Min.   : 17.00   admin.     :10422   divorced: 4612   university.degree  :12165  
 1st Qu.: 32.00   blue-collar: 9254   married :24928   high.school        : 9515  
 Median : 38.00   technician : 6743   single  :11568   basic.9y           : 6043  
 Mean   : 40.03   services   : 3969   NA's    :   80   professional.course: 5242  
 3rd Qu.: 47.00   management : 2924                    basic.4y           : 4175  
 Max.   :123.00   (Other)    : 7546                    (Other)            : 2310  
 NA's   :2        NA's       :  330                    NA's               : 1738  
 default      housing        loan            contact          month      
 no  :32588   no  :18622   no  :33950   cellular :26144   may    :13769  
 yes :    3   yes :21576   yes : 6248   telephone:15044   jul    : 7174  
 NA's: 8597   NA's:  990   NA's:  990                     aug    : 6178  
                                                          jun    : 5318  
                                                          nov    : 4101  
                                                          apr    : 2632  
                                                          (Other): 2016  
 day_of_week    duration         campaign          pdays          previous    
 fri:7827    Min.   :   0.0   Min.   : 1.000   Min.   :  0.0   Min.   :0.000  
 mon:8514    1st Qu.: 102.0   1st Qu.: 1.000   1st Qu.:999.0   1st Qu.:0.000  
 thu:8623    Median : 180.0   Median : 2.000   Median :999.0   Median :0.000  
 tue:8090    Mean   : 258.3   Mean   : 2.568   Mean   :962.5   Mean   :0.173  
 wed:8134    3rd Qu.: 319.0   3rd Qu.: 3.000   3rd Qu.:999.0   3rd Qu.:0.000  
             Max.   :4918.0   Max.   :56.000   Max.   :999.0   Max.   :7.000  
                                                                              
        poutcome      emp.var.rate      cons.price.idx  cons.conf.idx  
 failure    : 4252   Min.   :-3.40000   Min.   :92.20   Min.   :-50.8  
 nonexistent:35563   1st Qu.:-1.80000   1st Qu.:93.08   1st Qu.:-42.7  
 success    : 1373   Median : 1.10000   Median :93.75   Median :-41.8  
                     Mean   : 0.08189   Mean   :93.58   Mean   :-40.5  
                     3rd Qu.: 1.40000   3rd Qu.:93.99   3rd Qu.:-36.4  
                     Max.   : 1.40000   Max.   :94.77   Max.   :-26.9  
                                                                       
   euribor3m      nr.employed     y        
 Min.   :0.634   Min.   :4964   no :36548  
 1st Qu.:1.344   1st Qu.:5099   yes: 4640  
 Median :4.857   Median :5191              
 Mean   :3.621   Mean   :5167              
 3rd Qu.:4.961   3rd Qu.:5228              
 Max.   :5.045   Max.   :5228              
 
 

常识告诉我们,虽然123岁的老人存在,但概率也极低,也不太可能是银行的客户

找出在年龄这一列的上离群值和下离群值

> head(bank.dirty[order(bank.dirty$age,decreasing = TRUE),'age',drop=FALSE],n=5)

age

39494 123

38453  98

38456  98

27827  95

38922  94

> tail(bank.dirty[order(bank.dirty$age,decreasing = TRUE),'age',drop=FALSE],n=5)

age

37559  17

37580  17

38275  17

120    NA

156    NA

 

异常值的处理

当作缺失值处理
> bank.dirty$age[which(bank.dirty$age>98)]<-NA

删除或者插补

重编码

职业类型有12个分类,不利于后续分析,把除了unknown以外的分类进行重新编码,简化成4类

Month有12个分类,把它转化成季度

Education的分类,除了unknow之外有7类

进行重编码

levels(bank.dirty$job) <- c( "management","services","entrepreneur","entrepreneur",
                       "management","unemployed",  "entrepreneur","services",
                       "unemployed","services","unemployed","unknown" )
> levels(bank.dirty$month) <- c("Q2","Q3","Q4","Q3","Q2",
                        "Q1","Q2","Q4","Q4","Q3")
> 
> levels(bank.dirty$education) <- c( "primary","primary","primary","secondary",
                             "primary","tertiary","tertiary","unknown")
 
 

缺失值

分类较多,分类是unknown,不能给我们提供信息

有些模型不能处理缺失值,比如Logistic回归

缺失值插补的方法

1、  用中位数或众数插补

> library(imputeMissings)
> bank.clean<-impute(bank.dirty,object = compute(bank.dirty,method = "median/mode"))

2、  最邻近(knn)插补

library(DMwR)
bank.clean=knnImputation(bank.dirty,k=5)

3、  随机森林插补

library(missForest)

Imp = missForest(bank.dirty)

bank.clean = Imp$ximp

缺失值插补的R包

1、  imputeMissings包

2、  DMwR包

用Logistic回归建立客户响应模型

1、广义线性模型

广义线性模型擅长于处理因变量不是连续变量的问题

1)  Y是分类变量

2)  Y是定序变量

3)  Y是离散取值

2、当Y取值是0-1二分类变量是,就是Logistic回归

Logistic回归在R中的实现

数据重编码

bank$y=ifelse(bank$y=='yes',1,0)

改成以Q1为参考因子

bank$month<-relevel(bank$month,ref="Q1")

构建Logistic回归模型

> model<-glm(y~.,data=bank,family = 'binomial')
> summary(model)
 
Call:
glm(formula = y ~ ., family = "binomial", data = bank)
 
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-5.9958  -0.3082  -0.1887  -0.1333   3.4283  
 
Coefficients: (1 not defined because of singularities)
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                  -1.957e+02  1.935e+01 -10.116  < 2e-16 ***
age                           1.851e-03  2.415e-03   0.767 0.443289    
jobblue-collar               -2.659e-01  7.942e-02  -3.348 0.000814 ***
jobentrepreneur              -2.029e-01  1.248e-01  -1.626 0.103924    
jobhousemaid                 -3.628e-02  1.475e-01  -0.246 0.805705    
jobmanagement                -8.054e-02  8.501e-02  -0.947 0.343423    
jobretired                    2.928e-01  1.067e-01   2.743 0.006092 ** 
jobself-employed             -1.680e-01  1.176e-01  -1.428 0.153332    
jobservices                  -1.497e-01  8.552e-02  -1.751 0.079969 .  
jobstudent                    2.674e-01  1.106e-01   2.416 0.015680 *  
jobtechnician                 3.462e-03  7.096e-02   0.049 0.961086    
jobunemployed                 8.514e-03  1.273e-01   0.067 0.946686    
jobunknown                   -8.046e-02  2.390e-01  -0.337 0.736420    
maritalmarried                1.567e-02  6.824e-02   0.230 0.818420    
maritalsingle                 6.620e-02  7.791e-02   0.850 0.395473    
maritalunknown                6.303e-02  4.113e-01   0.153 0.878211    
educationbasic.6y             9.647e-02  1.202e-01   0.803 0.422195    
educationbasic.9y            -2.154e-02  9.494e-02  -0.227 0.820557    
educationhigh.school          3.381e-02  9.188e-02   0.368 0.712895    
educationilliterate           1.132e+00  7.395e-01   1.531 0.125887    
educationprofessional.course  1.136e-01  1.013e-01   1.121 0.262175    
educationuniversity.degree    2.134e-01  9.188e-02   2.322 0.020211 *  
educationunknown              1.361e-01  1.196e-01   1.138 0.255314    
defaultunknown               -3.055e-01  6.712e-02  -4.552 5.32e-06 ***
defaultyes                   -7.150e+00  1.135e+02  -0.063 0.949784    
housingunknown               -7.385e-02  1.390e-01  -0.531 0.595260    
housingyes                   -3.740e-03  4.121e-02  -0.091 0.927695    
loanunknown                          NA         NA      NA       NA    
loanyes                      -6.362e-02  5.725e-02  -1.111 0.266454    
contacttelephone             -6.068e-01  7.124e-02  -8.518  < 2e-16 ***
monthQ2                      -2.192e+00  1.125e-01 -19.479  < 2e-16 ***
monthQ3                      -1.463e+00  1.148e-01 -12.747  < 2e-16 ***
monthQ4                      -1.995e+00  1.240e-01 -16.088  < 2e-16 ***
day_of_weekmon               -1.216e-01  6.588e-02  -1.846 0.064887 .  
day_of_weekthu                6.375e-02  6.382e-02   0.999 0.317842    
day_of_weektue                6.867e-02  6.545e-02   1.049 0.294118    
day_of_weekwed                1.436e-01  6.530e-02   2.199 0.027911 *  
duration                      4.667e-03  7.397e-05  63.092  < 2e-16 ***
campaign                     -4.543e-02  1.158e-02  -3.922 8.77e-05 ***
pdays                        -9.627e-04  2.162e-04  -4.452 8.50e-06 ***
previous                     -5.806e-02  5.879e-02  -0.988 0.323369    
poutcomenonexistent           4.507e-01  9.372e-02   4.809 1.51e-06 ***
poutcomesuccess               9.371e-01  2.106e-01   4.451 8.56e-06 ***
emp.var.rate                 -1.389e+00  7.693e-02 -18.057  < 2e-16 ***
cons.price.idx                1.815e+00  1.193e-01  15.218  < 2e-16 ***
cons.conf.idx                 3.353e-02  6.664e-03   5.033 4.84e-07 ***
euribor3m                     6.054e-02  1.126e-01   0.537 0.590987    
nr.employed                   4.937e-03  1.873e-03   2.635 0.008413 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
(Dispersion parameter for binomial family taken to be 1)
 
    Null deviance: 28999  on 41187  degrees of freedom
Residual deviance: 17199  on 41141  degrees of freedom
AIC: 17293
 
Number of Fisher Scoring iterations: 10
> exp(coef(model))
                 (Intercept)                          age               jobblue-collar 
                9.856544e-86                 1.001853e+00                 7.665077e-01 
             jobentrepreneur                 jobhousemaid                jobmanagement 
                8.163314e-01                 9.643733e-01                 9.226187e-01 
                  jobretired             jobself-employed                  jobservices 
                1.340142e+00                 8.453874e-01                 8.609387e-01 
                  jobstudent                jobtechnician                jobunemployed 
                1.306514e+00                 1.003468e+00                 1.008550e+00 
                  jobunknown               maritalmarried                maritalsingle 
                9.226922e-01                 1.015789e+00                 1.068445e+00 
              maritalunknown            educationbasic.6y            educationbasic.9y 
                1.065061e+00                 1.101276e+00                 9.786948e-01 
        educationhigh.school          educationilliterate educationprofessional.course 
                1.034388e+00                 3.101297e+00                 1.120248e+00 
  educationuniversity.degree             educationunknown               defaultunknown 
                1.237856e+00                 1.145744e+00                 7.367445e-01 
                  defaultyes               housingunknown                   housingyes 
                7.851906e-04                 9.288126e-01                 9.962671e-01 
                 loanunknown                      loanyes             contacttelephone 
                          NA                 9.383587e-01                 5.450980e-01 
                     monthQ2                      monthQ3                      monthQ4 
                1.116739e-01                 2.314802e-01                 1.360620e-01 
              day_of_weekmon               day_of_weekthu               day_of_weektue 
                8.854888e-01                 1.065828e+00                 1.071082e+00 
              day_of_weekwed                     duration                     campaign 
                1.154380e+00                 1.004678e+00                 9.555850e-01 
                       pdays                     previous          poutcomenonexistent 
                9.990378e-01                 9.435960e-01                 1.569466e+00 
             poutcomesuccess                 emp.var.rate               cons.price.idx 
                2.552531e+00                 2.493091e-01                 6.140533e+00 
               cons.conf.idx                    euribor3m                  nr.employed 
                1.034103e+00                 1.062408e+00                 1.004949e+00 

Job变量的基准水平是management,从上面的结果看,服务业和自主劳动者购买银行产品的几率(odds)是管理岗从业人员的0.88倍,未就业人员购买银行产品的几率是管理岗人员的1.25倍

> summary(model.step)
向前逐步回归
> model.step=step(model,direction = "backward")
向后逐步回归
> model.step = step(model, direction = "forward")
双向逐步回归
> model.step = step(model, direction = "both")

> summary(model.step)

Call:

glm(formula = y ~ job + education + default + contact + month +

day_of_week + duration + campaign + pdays + poutcome + emp.var.rate +

cons.price.idx + cons.conf.idx + nr.employed, family = "binomial",

data = bank)

Deviance Residuals:

Min       1Q   Median       3Q      Max

-5.9884  -0.3088  -0.1887  -0.1332   3.4026

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept)                  -2.031e+02  1.426e+01 -14.246  < 2e-16 ***

jobblue-collar               -2.700e-01  7.917e-02  -3.411 0.000648 ***

jobentrepreneur              -2.043e-01  1.242e-01  -1.645 0.100003

jobhousemaid                 -2.832e-02  1.464e-01  -0.193 0.846590

jobmanagement                -8.368e-02  8.409e-02  -0.995 0.319670

jobretired                    3.234e-01  9.130e-02   3.542 0.000397 ***

jobself-employed             -1.670e-01  1.176e-01  -1.421 0.155435

jobservices                  -1.528e-01  8.545e-02  -1.789 0.073666 .

jobstudent                    2.682e-01  1.046e-01   2.565 0.010316 *

jobtechnician                 4.389e-03  7.093e-02   0.062 0.950665

jobunemployed                 8.975e-03  1.271e-01   0.071 0.943715

jobunknown                   -6.363e-02  2.378e-01  -0.268 0.789057

educationbasic.6y             8.993e-02  1.196e-01   0.752 0.452024

educationbasic.9y            -2.716e-02  9.416e-02  -0.288 0.772992

educationhigh.school          2.890e-02  9.053e-02   0.319 0.749573

educationilliterate           1.118e+00  7.398e-01   1.511 0.130744

educationprofessional.course  1.084e-01  1.004e-01   1.079 0.280686

educationuniversity.degree    2.103e-01  9.017e-02   2.332 0.019678 *

educationunknown              1.363e-01  1.195e-01   1.140 0.254110

defaultunknown               -3.017e-01  6.666e-02  -4.526 6.02e-06 ***

defaultyes                   -7.141e+00  1.135e+02  -0.063 0.949831

contacttelephone             -6.011e-01  7.069e-02  -8.504  < 2e-16 ***

monthQ2                      -2.210e+00  1.108e-01 -19.939  < 2e-16 ***

monthQ3                      -1.475e+00  1.146e-01 -12.869  < 2e-16 ***

monthQ4                      -1.982e+00  1.183e-01 -16.755  < 2e-16 ***

day_of_weekmon               -1.210e-01  6.584e-02  -1.837 0.066174 .

day_of_weekthu                6.208e-02  6.374e-02   0.974 0.330066

day_of_weektue                6.851e-02  6.538e-02   1.048 0.294651

day_of_weekwed                1.420e-01  6.525e-02   2.176 0.029592 *

duration                      4.667e-03  7.396e-05  63.099  < 2e-16 ***

campaign                     -4.587e-02  1.158e-02  -3.960 7.49e-05 ***

pdays                        -8.822e-04  2.024e-04  -4.358 1.31e-05 ***

poutcomenonexistent           5.219e-01  6.356e-02   8.211  < 2e-16 ***

poutcomesuccess               9.996e-01  2.028e-01   4.928 8.31e-07 ***

emp.var.rate                 -1.376e+00  6.885e-02 -19.980  < 2e-16 ***

cons.price.idx                1.845e+00  1.041e-01  17.725  < 2e-16 ***

cons.conf.idx                 3.622e-02  4.853e-03   7.464 8.42e-14 ***

nr.employed                   5.883e-03  9.765e-04   6.024 1.70e-09 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 28999  on 41187  degrees of freedom

Residual deviance: 17203  on 41150  degrees of freedom

AIC: 17279

Number of Fisher Scoring iterations: 10

 
 

模型预测

用predict函数,参数type=’response’

Newdata参数是要预测的数据集

> prob<-predict(model.step,type = 'response')
> head(prob)
          1           2           3           4           5           6 
0.015029328 0.006044212 0.011640349 0.010173952 0.016897254 0.007174804 

假设以0.5为临界值

> pre<-ifelse(prob>0.5,1,0)

> table(pre,bank$y)

pre     0     1

0 35596  2667

1   952  1973

 

>

预测的准确率

> (35592+1964)/(35592+2676+956+1964)

[1] 0.911819

 
 

实际有响应的客户被识别出了多少

> 1964/(1964+2676)
[1] 0.4232759

模型评估

> confusionMatrix(bank$y,pre,pos='1')
Confusion Matrix and Statistics
 
          Reference
Prediction     0     1
         0 35596   952
         1  2667  1973
                                          
               Accuracy : 0.9121          
                 95% CI : (0.9094, 0.9149)
    No Information Rate : 0.929           
    P-Value [Acc > NIR] : 1               
                                          
                  Kappa : 0.476           
 Mcnemar's Test P-Value : <2e-16          
                                          
            Sensitivity : 0.67453         
            Specificity : 0.93030         
         Pos Pred Value : 0.42522         
         Neg Pred Value : 0.97395         
             Prevalence : 0.07102         
         Detection Rate : 0.04790         
   Detection Prevalence : 0.11265         
      Balanced Accuracy : 0.80241         
                                          
       'Positive' Class : 1               
                                    

Kappa 统计量(kappa statistic)

用于评判分类器的分类结果与随机分类的差异度

用Kappa统计量评价:

较差:小于0.20

一般:0.20至0.40

稳健:0.40至0.60

好的:0.60至0.80

很好的:0.80至1.00

ROC曲线

pred<-prediction(prob,bank$y)
perf<-performance(pred,measure = "tpr",x="fpr")
plot(perf)


 
 
 
 
 
 
 
 
 
 
 
 
RandomForest
加载数据列
 

> data=read.table("input.txt",header = TRUE)

> str(data)

'data.frame':  222 obs. of  23 variables:

$ Acti_Profile             : num  0 0 0 0 0 0 0 0 0 0 ...

$ Activity                 : num  1.25 0 0.938 6.562 0 ...

$ Diastolic_PTT            : num  256 240 253 0 241 ...

$ Diastolic                : num  73.2 78.6 74 0 78.4 ...

$ Heart_Rate_Curve         : num  81.2 69.7 77.6 95 83.6 ...

$ Heart_Rate_Variability_HF: num  131 250 135 144 141 ...

$ Heart_Rate_Variability_LF: num  311 218 203 301 244 ...

$ MAP                      : num  86 93.5 86.9 0 91.7 ...

$ Position                 : num  0 0 0 1 0 0 0 0 0 0 ...

$ PTT_Raw                  : num  308 288 308 0 295 ...

$ RR_Interval              : num  734 878 773 632 714 ...

$ Sleep_Wake               : num  1 1 1 1 1 0 1 1 0 0 ...

$ SpO2                     : num  0 0 99 0 98.4 ...

$ Sympatho_Vagal_Balance   : num  23 8.17 14.5 20.4 16.88 ...

$ Systolic_PTT             : num  308 288 307 0 295 ...

$ Systolic                 : num  113 124 113 0 119 ...

$ Autonomic_arousals       : num  0 0 0 0 0 0 0 0 0 0 ...

$ Cardio_complex           : num  0 0 0 1 0 0 0 0 0 0 ...

$ Cardio_rhythm            : num  0 0 2 0 0 0 0 0 0 0 ...

$ Classification_Arousal   : num  0 0 0 0 0 0 0 0 0 0 ...

$ PTT_Events               : num  1 0 2 0 0 0 0 0 0 0 ...

$ Systolic_Events          : num  1 0 1 0 0 0 0 0 0 0 ...

$ y                        : num  1 0 1 0 0 0 0 0 0 0 ...

加载随机森林包

> library(randomForest)

进行训练  以y作为因变量,其余数据作为自变量

> rf <- randomForest(y ~ ., data=data, ntree=100, proximity=TRUE,importance=TRUE)

> plot(rf)

重要性检测

衡量把一个变量的取值变为随机数,随机森林预测准确性的降低程度

> importance(rf,type=1)

%IncMSE

Acti_Profile               0.00000000

Activity                   0.99353251

Diastolic_PTT              0.32193611

Diastolic                  1.99891809

Heart_Rate_Curve           0.92001352

Heart_Rate_Variability_HF  2.07870722

Heart_Rate_Variability_LF -0.24957163

MAP                        0.48142975

Position                   1.86876751

PTT_Raw                    1.94648914

RR_Interval                0.60557964

Sleep_Wake                 1.00503782

SpO2                       0.25396165

Sympatho_Vagal_Balance     1.42906765

Systolic_PTT               1.27965813

Systolic                   0.77382673

Autonomic_arousals         0.00000000

Cardio_complex             1.00503782

Cardio_rhythm              1.14283152

Classification_Arousal    -0.04383997

PTT_Events                 4.63980680

Systolic_Events           33.29461169

输出随机森林的模型

> print(rf)

Call:

randomForest(formula = y ~ ., data = data, ntree = 100, proximity = TRUE,      importance = TRUE)

Type of random forest: regression

Number of trees: 100

No. of variables tried at each split: 7

Mean of squared residuals: 0.003226897     残差平方和SSE

% Var explained: 98.7

 

>

总平方和(SST):(样本数据-样本均值)的平方和

回归平方和(SSR):(预测数据-样本均值)的平方和

残差平方和(SSE):(样本数据-预测数据均值)的平方和

SST = SSR + SSE 

基尼指数:

> importance(rf,type=2)

IncNodePurity

Acti_Profile                0.000000000

Activity                    0.445181480

Diastolic_PTT               0.452221870

Diastolic                   0.449372186

Heart_Rate_Curve            0.473113852

Heart_Rate_Variability_HF   0.226815300

Heart_Rate_Variability_LF   0.205457353

MAP                         0.536977574

Position                    0.307333210

PTT_Raw                     0.656726800

RR_Interval                 0.452738011

Sleep_Wake                  0.014423077

SpO2                        1.793361279

Sympatho_Vagal_Balance      0.352759689

Systolic_PTT                0.851951505

Systolic                    0.823955781

Autonomic_arousals          0.000000000

Cardio_complex              0.008047619

Cardio_rhythm               0.141907084

Classification_Arousal      0.085739429

PTT_Events                  7.468690820

Systolic_Events            39.000163018

 

>

进行预测

prediction <- predict(rf, data[,],type="response")

输出预测结果

table(observed =data$y,predicted=prediction)

plot(prediction)

支持向量机

library(e1071)

svmfit<-svm(y~.,data=data,kernel="linear",cost=10,scale=FALSE)

> print(svmfit)

Call:

svm(formula = y ~ ., data = data, kernel = "linear", cost = 10, scale = FALSE)

Parameters:

SVM-Type:  eps-regression

SVM-Kernel:  linear

cost:  10

gamma:  0.04545455

epsilon:  0.1

Number of Support Vectors:  20

> plot(svmfit,data)

 

神经网络

> concrete<-read_excel("Concrete_Data.xls")

> str(concrete)

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    1030 obs. of  9 variables:

$ Cement      : num  540 540 332 332 199 ...

$ Slag        : num  0 0 142 142 132 ...

$ Ash         : num  0 0 0 0 0 0 0 0 0 0 ...

$ water       : num  162 162 228 228 192 228 228 228 228 228 ...

$ superplastic: num  2.5 2.5 0 0 0 0 0 0 0 0 ...

$ coarseagg   : num  1040 1055 932 932 978 ...

$ fineagg     : num  676 676 594 594 826 ...

$ age         : num  28 28 270 365 360 90 365 28 28 28 ...

$ strength    : num  80 61.9 40.3 41.1 44.3 ...

> normalize <- function(x){ return ((x-min(x))/(max(x)-min(x)))}

> concrete_norm <- as.data.frame(lapply(concrete,normalize))

> concrete_train <- concrete_norm[1:773,]

> concrete_test <- concrete_norm[774:1030,]

> library(neuralnet)

> concrete_model <- neuralnet(strength ~ Cement+Slag+Ash+water+superplastic+coarseagg+fineagg+age,data=concrete_train)

> plot(concrete_model)

model_results <- compute(concrete_model,concrete_test[1:8])

predicted_strength <- model_results$net.result

> cor(predicted_strength,concrete_test$strength)

[,1]

[1,] 0.7205120076

> concrete_model2 <- neuralnet(strength ~ Cement+Slag+Ash+water+superplastic+coarseagg+fineagg+age,data=concrete_train,hidden=5)

> plot(concrete_model2)

计算误差

> model_results2 <- compute(concrete_model2,concrete_test[1:8])

> predicted_strength2 <- model_results2$net.result

> cor(predicted_strength2,concrete_test$strength)

[,1]

[1,] 0.6727155609

 

>

主成分分析

身高、体重、胸围、坐高

> test<-data.frame(

+     X1=c(148, 139, 160, 149, 159, 142, 153, 150, 151, 139,

+          140, 161, 158, 140, 137, 152, 149, 145, 160, 156,

+          151, 147, 157, 147, 157, 151, 144, 141, 139, 148),

+     X2=c(41, 34, 49, 36, 45, 31, 43, 43, 42, 31,

+          29, 47, 49, 33, 31, 35, 47, 35, 47, 44,

+          42, 38, 39, 30, 48, 36, 36, 30, 32, 38),

+     X3=c(72, 71, 77, 67, 80, 66, 76, 77, 77, 68,

+          64, 78, 78, 67, 66, 73, 82, 70, 74, 78,

+          73, 73, 68, 65, 80, 74, 68, 67, 68, 70),

+     X4=c(78, 76, 86, 79, 86, 76, 83, 79, 80, 74,

+          74, 84, 83, 77, 73, 79, 79, 77, 87, 85,

+          82, 78, 80, 75, 88, 80, 76, 76, 73, 78)

+ )

> test.pr<-princomp(test,cor=TRUE)

> summary(test.pr,loadings=TRUE)

Importance of components:

Comp.1        Comp.2        Comp.3        Comp.4

Standard deviation     1.8817805390 0.55980635717 0.28179594325 0.25711843909

Proportion of Variance 0.8852744993 0.07834578938 0.01985223841 0.01652747293

Cumulative Proportion  0.8852744993 0.96362028866 0.98347252707 1.00000000000

Loadings:

Comp.1 Comp.2 Comp.3 Comp.4

X1  0.497  0.543 -0.450  0.506

X2  0.515 -0.210 -0.462 -0.691

X3  0.481 -0.725  0.175  0.461

X4  0.507  0.368  0.744 -0.232

前两个主成分的累计贡献率已经达到96% 可以舍去另外两个主成分达到降维的目的

因此可以得到函数表达式 Z1=-0.497X'1-0.515X'2-0.481X'3-0.507X'4

Z2=  0.543X'1-0.210X'2-0.725X'3-0.368X'4

4.画主成分的碎石图并预测

> screeplot(test.pr,type="lines")

> p<-predict(test.pr)

> p

Comp.1         Comp.2         Comp.3          Comp.4

[1,] -0.06990949737 -0.23813701272 -0.35509247634 -0.266120139417

[2,] -1.59526339772 -0.71847399061  0.32813232022 -0.118056645885

[3,]  2.84793151061  0.38956678680 -0.09731731272 -0.279482487139

[4,] -0.75996988424  0.80604334819 -0.04945721875 -0.162949297761

[5,]  2.73966776853  0.01718087263  0.36012614873  0.358653043787

[6,] -2.10583167924  0.32284393414  0.18600422367 -0.036456083707

[7,]  1.42105591247 -0.06053164925  0.21093320662 -0.044223092351

[8,]  0.82583976981 -0.78102575640 -0.27557797533  0.057288571933

[9,]  0.93464401954 -0.58469241699 -0.08814135786  0.181037745585

[10,] -2.36463819933 -0.36532199291  0.08840476284  0.045520127461

[11,] -2.83741916086  0.34875841111  0.03310422938 -0.031146930047

[12,]  2.60851223537  0.21278727930 -0.33398036623  0.210157574387

[13,]  2.44253342081 -0.16769495893 -0.46918095412 -0.162987829937

[14,] -1.86630668724  0.05021383642  0.37720280364 -0.358821916178

[15,] -2.81347420580 -0.31790107093 -0.03291329149 -0.222035112399

[16,] -0.06392982655  0.20718447599  0.04334339948  0.703533623798

[17,]  1.55561022242 -1.70439673831 -0.33126406220  0.007551878960

[18,] -1.07392250663 -0.06763418320  0.02283648409  0.048606680158

[19,]  2.52174211878  0.97274300950  0.12164633439 -0.390667990681

[20,]  2.14072377494  0.02217881219  0.37410972458  0.129548959692

[21,]  0.79624421805  0.16307887263  0.12781269571 -0.294140762463

[22,] -0.28708320594 -0.35744666106 -0.03962115883  0.080991988802

[23,]  0.25151075072  1.25555187663 -0.55617324819  0.109068938725

[24,] -2.05706031616  0.78894493512 -0.26552109297  0.388088642937

[25,]  3.08596854773 -0.05775318018  0.62110421208 -0.218939612456

[26,]  0.16367554630  0.04317931667  0.24481850312  0.560248997030

[27,] -1.37265052598  0.02220972121 -0.23378320040 -0.257399715466

[28,] -2.16097778154  0.13733232981  0.35589738735  0.093123683044

[29,] -2.40434826507 -0.48613137190 -0.16154440788 -0.007914021222

[30,] -0.50287467640  0.14734316507 -0.20590831261 -0.122078819188

 

>

93、R语言教程详解的更多相关文章

  1. SAE上传web应用(包括使用数据库)教程详解及问题解惑

    转自:http://blog.csdn.net/baiyuliang2013/article/details/24725995 SAE上传web应用(包括使用数据库)教程详解及问题解惑: 最近由于工作 ...

  2. QuartusII13.0使用教程详解(一个完整的工程建立)

    好久都没有发布自己的博客了,因为最近学校有比赛,从参加到现在都是一脸懵逼,幸亏有bingo大神的教程,让我慢慢走上了VIP之旅,bingo大神的无私奉献精神值得我们每一个业界人士学习,向bingo致敬 ...

  3. windows上安装Anaconda和python的教程详解

    一提到数字图像处理编程,可能大多数人就会想到matlab,但matlab也有自身的缺点: 1.不开源,价格贵 2.软件容量大.一般3G以上,高版本甚至达5G以上. 3.只能做研究,不易转化成软件. 因 ...

  4. 干货!上古神器 sed 教程详解,小白也能看的懂

    目录: 介绍工作原理正则表达式基本语法数字定址和正则定址基本子命令实战练习 介绍 熟悉 Linux 的同学一定知道大名鼎鼎的 Linux 三剑客,它们是 grep.awk.sed,我们今天要聊的主角就 ...

  5. 史上最全的maven pom.xml文件教程详解

    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/20 ...

  6. webpack安装配置使用教程详解

    webpack安装配置使用教程详解 www.111cn.net 更新:2015-09-01 编辑:swteen 来源:转载 本文章来为各位详细的介绍一下关于webpack安装配置使用教程吧,这篇文章对 ...

  7. 重置出错?微软Win10平板Surface Pro 4重装系统教程详解

    重置出错?微软Win10平板Surface Pro 4重装系统教程详解 2015-12-11 15:27:30来源:IT之家作者:凌空责编:凌空 评论:65 Surface Pro 4系统重置出错该怎 ...

  8. Ubuntu下安装JDK图文教程详解 jdk-java6-30 .bin 的处理方法

    Ubuntu下安装JDK图文教程详解 jdk-java6-30 .bin 的处理方法: https://blog.csdn.net/mingjie1212/article/details/485250 ...

  9. Webstorm使用教程详解

    Webstorm使用教程详解 Webstorm垂直分栏.左右分栏 Webstorm 主题.背景.颜色等设置的导入导出   使用WebStorm开发web前端 网页中文乱码问题的解决方案 Webstor ...

随机推荐

  1. 测开之路三十一:Flask基础之请求与相应

    from flask import requestrequest.pathrequest.methodrequest.formrequest.argsrequest.values 一般用form获取p ...

  2. ubuntu 设置固定IP

    vim  /etc/network/interface address   要固定的IP地址 netmask  子网掩码  A类地址 默认255.0.0.0   B类地址默 255.255.0.0  ...

  3. apt-cyg for Cygwin(setup-x86_64 .exe )在win10下的安装

    cygwin安装后,如果没有选择安装所有包(这会占用5G空间,很多包不需要),再需要安装新的包,可以启动setup-x86_64 .exe(我把它放置在C:\cygwin64目录下),添加包(如wge ...

  4. 安装第三方包&查看python版本/第三方包版本

    安装第三方包时,经常需要查看python版本,以及是否安装第三方包及版本,每次都要百度下指令. 故小编整理了下安装/卸载第三方包,查看python/第三包的指令,具体如下: 一.python安装/卸载 ...

  5. oracle执行计划(转载)

    转载自 https://www.cnblogs.com/Dreamer-1/p/6076440.html 一:什么是Oracle执行计划? 执行计划是一条查询语句在Oracle中的执行过程或访问路径的 ...

  6. Python Challenge 关卡目录及解答过程

    第0关:http://www.pythonchallenge.com/pc/def/0.html 线索:试着改变URL的地址-->把图片中得到的数字输入到URL中 2**38 输出: 第1关:h ...

  7. MQ之Kafka

    现代的互联网分布式系统,只要稍微大一些,就一定逃不开3类中间件:远程调用(RPC)框架.消息队列.数据库访问中间件.Kafka 是消息队列中间件的代表产品,用 Scala 语言实现; 基本概念 首先, ...

  8. vue规范规则

    vue组件,数据通信,样式,JS的规范规则.对vue官方风格指南的总结归类并加入自己的规范,在团队小组中使用. 1.项目名/文件目录命名: kebab-case(- 连接): 项目名:vue-admi ...

  9. 更改mysql最大连接数

    方法一: 打开cmd,用"mysql -u root -p;"命令进入mysql, 输入命令:show variables like "max_connections&q ...

  10. 使用egg.js和egg-sequelize连接mysql

    1.通过 egg-init 初始化一个项目: egg-init --type=simple --dir=sequelize-projectcd sequelize-projectnpm i 2.安装并 ...