93、R语言教程详解
- 加载数据
- > w<-read.table("test.prn",header = T)
- > w
- X.. X...1
- 1 A 2
- 2 B 3
- 3 C 5
- 4 D 5
- > library(readxl)
- > dat<-read_excel("test.xlsx")
- > dat
- # A tibble: 4 x 2
- `商品` `价格`
- <chr> <dbl>
- 1 A 2
- 2 B 3
- 3 C 5
- 4 D 5
- > bank=read.table("bank-full.csv",header = TRUE,sep=",")
- 查看数据结构
- > str(bank)
- 'data.frame': 41188 obs. of 21 variables:
- $ age : int 56 57 37 40 56 45 59 41 24 25 ...
- $ job : Factor w/ 12 levels "admin.","blue-collar",..: 4 8 8 1 8 8 1 2 10 8 ...
- $ marital : Factor w/ 4 levels "divorced","married",..: 2 2 2 2 2 2 2 2 3 3 ...
- $ education : Factor w/ 8 levels "basic.4y","basic.6y",..: 1 4 4 2 4 3 6 8 6 4 ...
- $ default : Factor w/ 3 levels "no","unknown",..: 1 2 1 1 1 2 1 2 1 1 ...
- $ housing : Factor w/ 3 levels "no","unknown",..: 1 1 3 1 1 1 1 1 3 3 ...
- $ loan : Factor w/ 3 levels "no","unknown",..: 1 1 1 1 3 1 1 1 1 1 ...
- $ contact : Factor w/ 2 levels "cellular","telephone": 2 2 2 2 2 2 2 2 2 2 ...
- $ month : Factor w/ 10 levels "apr","aug","dec",..: 7 7 7 7 7 7 7 7 7 7 ...
- $ day_of_week : Factor w/ 5 levels "fri","mon","thu",..: 2 2 2 2 2 2 2 2 2 2 ...
- $ duration : int 261 149 226 151 307 198 139 217 380 50 ...
- $ campaign : int 1 1 1 1 1 1 1 1 1 1 ...
- $ pdays : int 999 999 999 999 999 999 999 999 999 999 ...
- $ previous : int 0 0 0 0 0 0 0 0 0 0 ...
- $ poutcome : Factor w/ 3 levels "failure","nonexistent",..: 2 2 2 2 2 2 2 2 2 2 ...
- $ emp.var.rate : num 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
- $ cons.price.idx: num 94 94 94 94 94 ...
- $ cons.conf.idx : num -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 ...
- $ euribor3m : num 4.86 4.86 4.86 4.86 4.86 ...
- $ nr.employed : num 5191 5191 5191 5191 5191 ...
- $ y : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
- 查看数据的最小值,最大值,中位数,平均数,分位数
- > summary(bank)
- age job marital
- Min. :17.00 admin. :10422 divorced: 4612
- 1st Qu.:32.00 blue-collar: 9254 married :24928
- Median :38.00 technician : 6743 single :11568
- Mean :40.02 services : 3969 unknown : 80
- 3rd Qu.:47.00 management : 2924
- Max. :98.00 retired : 1720
- (Other) : 6156
- education default housing
- university.degree :12168 no :32588 no :18622
- high.school : 9515 unknown: 8597 unknown: 990
- basic.9y : 6045 yes : 3 yes :21576
- professional.course: 5243
- basic.4y : 4176
- basic.6y : 2292
- (Other) : 1749
- loan contact month day_of_week
- no :33950 cellular :26144 may :13769 fri:7827
- unknown: 990 telephone:15044 jul : 7174 mon:8514
- yes : 6248 aug : 6178 thu:8623
- jun : 5318 tue:8090
- nov : 4101 wed:8134
- apr : 2632
- (Other): 2016
- duration campaign pdays
- Min. : 0.0 Min. : 1.000 Min. : 0.0
- 1st Qu.: 102.0 1st Qu.: 1.000 1st Qu.:999.0
- Median : 180.0 Median : 2.000 Median :999.0
- Mean : 258.3 Mean : 2.568 Mean :962.5
- 3rd Qu.: 319.0 3rd Qu.: 3.000 3rd Qu.:999.0
- Max. :4918.0 Max. :56.000 Max. :999.0
- previous poutcome emp.var.rate
- Min. :0.000 failure : 4252 Min. :-3.40000
- 1st Qu.:0.000 nonexistent:35563 1st Qu.:-1.80000
- Median :0.000 success : 1373 Median : 1.10000
- Mean :0.173 Mean : 0.08189
- 3rd Qu.:0.000 3rd Qu.: 1.40000
- Max. :7.000 Max. : 1.40000
- cons.price.idx cons.conf.idx euribor3m
- Min. :92.20 Min. :-50.8 Min. :0.634
- 1st Qu.:93.08 1st Qu.:-42.7 1st Qu.:1.344
- Median :93.75 Median :-41.8 Median :4.857
- Mean :93.58 Mean :-40.5 Mean :3.621
- 3rd Qu.:93.99 3rd Qu.:-36.4 3rd Qu.:4.961
- Max. :94.77 Max. :-26.9 Max. :5.045
- nr.employed y
- Min. :4964 no :36548
- 1st Qu.:5099 yes: 4640
- Median :5191
- Mean :5167
- 3rd Qu.:5228
- Max. :5228
- > psych::describe(bank)
- 方差 个数 平均值 标准差 均值 去掉最大 中位数 最小值 最大值 极差 偏差 峰度
- 绝对偏差
- 最小值
- 之后
- 的平均数
- vars n mean sd median trimmed mad min max range skew kurtosis
- age 1 41188 40.02 10.42 38.00 39.30 10.38 17.00 98.00 81.00 0.78 0.79
- job* 2 41188 4.72 3.59 3.00 4.48 2.97 1.00 12.00 11.00 0.45 -1.39
- marital* 3 41188 2.17 0.61 2.00 2.21 0.00 1.00 4.00 3.00 -0.06 -0.34
- education* 4 41188 4.75 2.14 4.00 4.88 2.97 1.00 8.00 7.00 -0.24 -1.21
- default* 5 41188 1.21 0.41 1.00 1.14 0.00 1.00 3.00 2.00 1.44 0.07
- housing* 6 41188 2.07 0.99 3.00 2.09 0.00 1.00 3.00 2.00 -0.14 -1.95
- loan* 7 41188 1.33 0.72 1.00 1.16 0.00 1.00 3.00 2.00 1.82 1.38
- contact* 8 41188 1.37 0.48 1.00 1.33 0.00 1.00 2.00 1.00 0.56 -1.69
- month* 9 41188 5.23 2.32 5.00 5.31 2.97 1.00 10.00 9.00 -0.31 -1.03
- day_of_week* 10 41188 3.00 1.40 3.00 3.01 1.48 1.00 5.00 4.00 0.01 -1.27
- duration 11 41188 258.29 259.28 180.00 210.61 139.36 0.00 4918.00 4918.00 3.26 20.24
- campaign 12 41188 2.57 2.77 2.00 1.99 1.48 1.00 56.00 55.00 4.76 36.97
- pdays 13 41188 962.48 186.91 999.00 999.00 0.00 0.00 999.00 999.00 -4.92 22.23
- previous 14 41188 0.17 0.49 0.00 0.05 0.00 0.00 7.00 7.00 3.83 20.11
- poutcome* 15 41188 1.93 0.36 2.00 2.00 0.00 1.00 3.00 2.00 -0.88 3.98
- emp.var.rate 16 41188 0.08 1.57 1.10 0.27 0.44 -3.40 1.40 4.80 -0.72 -1.06
- cons.price.idx 17 41188 93.58 0.58 93.75 93.58 0.56 92.20 94.77 2.57 -0.23 -0.83
- cons.conf.idx 18 41188 -40.50 4.63 -41.80 -40.60 6.52 -50.80 -26.90 23.90 0.30 -0.36
- euribor3m 19 41188 3.62 1.73 4.86 3.81 0.16 0.63 5.04 4.41 -0.71 -1.41
- nr.employed 20 41188 5167.04 72.25 5191.00 5178.43 55.00 4963.60 5228.10 264.50 -1.04 0.00
- y* 21 41188 1.11 0.32 1.00 1.02 0.00 1.00 2.00 1.00 2.45 4.00
- se
- age 0.05
- job* 0.02
- marital* 0.00
- education* 0.01
- default* 0.00
- housing* 0.00
- loan* 0.00
- contact* 0.00
- month* 0.01
- day_of_week* 0.01
- duration 1.28
- campaign 0.01
- pdays 0.92
- previous 0.00
- poutcome* 0.00
- emp.var.rate 0.01
- cons.price.idx 0.00
- cons.conf.idx 0.02
- euribor3m 0.01
- nr.employed 0.36
- y* 0.00
- 查看数据是否有缺失值
- > sapply(bank,anyNA)
- age job marital education
- FALSE FALSE FALSE FALSE
- default housing loan contact
- FALSE FALSE FALSE FALSE
- month day_of_week duration campaign
- FALSE FALSE FALSE FALSE
- pdays previous poutcome emp.var.rate
- FALSE FALSE FALSE FALSE
- cons.price.idx cons.conf.idx euribor3m nr.employed
- FALSE FALSE FALSE FALSE
- y
- FALSE
- 成功与不成功的个数
- > table(bank$y)
- no yes
- 36548 4640
- 在是否结婚这个属性的取值与
- 是否成功的数量比较
- > table(bank$y,bank$marital)
- divorced married single unknown
- no 4136 22396 9948 68
- yes 476 2532 1620 12
- > xtabs(~y+marital,data=bank)
- marital
- y divorced married single unknown
- no 4136 22396 9948 68
- yes 476 2532 1620 12
- > tab=table(bank$y,bank$marital)
- > tab
- divorced married single unknown
- no 4136 22396 9948 68
- yes 476 2532 1620 12
- 在是否结婚这个属性上的取值
- > margin.table(tab,2)
- divorced married single unknown
- 4612 24928 11568 80
- > margin.table(tab,1)
- no yes
- 36548 4640
- 在是否结婚这个属性上横向看概率
- > prop.table(tab,1)
- divorced married single unknown
- no 0.113166247 0.612783189 0.272189997 0.001860567
- yes 0.102586207 0.545689655 0.349137931 0.002586207
- 在是否结婚这个属性上纵向看概率
- > prop.table(tab,2)
- divorced married single unknown
- no 0.8967910 0.8984275 0.8599585 0.8500000
- yes 0.1032090 0.1015725 0.1400415 0.1500000
- 平的列联表
- 以第一列和第二列,展开分类group by 1,2
- 以col.vars 的取值 进行次数统计
- > ftable(bank[,c(3,4,21)],row.vars = 1:2,col.vars = "y")
- y no yes
- marital education
- divorced basic.4y 406 83
- basic.6y 169 13
- basic.9y 534 31
- high.school 1086 107
- illiterate 1 1
- professional.course 596 61
- university.degree 1177 160
- unknown 167 20
- married basic.4y 2915 313
- basic.6y 1628 139
- basic.9y 3858 298
- high.school 4683 475
- illiterate 12 3
- professional.course 2799 357
- university.degree 5573 821
- unknown 928 126
- single basic.4y 422 31
- basic.6y 301 36
- basic.9y 1174 142
- high.school 2702 448
- illiterate 1 0
- professional.course 1247 177
- university.degree 3723 683
- unknown 378 103
- unknown basic.4y 5 1
- basic.6y 6 0
- basic.9y 6 2
- high.school 13 1
- illiterate 0 0
- professional.course 6 0
- university.degree 25 6
- unknown 7 2
- 卡方检验,在p值小于2.2e-16时,拒绝原假设,认为数据不服从卡方分布
- > chisq.test(tab)
- Pearson's Chi-squared test
- data: tab
- X-squared = 122.66, df = 3, p-value < 2.2e-16
- 画直方图
- > hist(bank$age)
- > library(lattice)
- 画连续变量的分布,就是把直方图的中位数连接起来
- 以年龄为横轴,y为纵轴,数据是bank,画图,auto.key是否有图例
- > densityplot(~age,groups = y,data=bank,plot.point=FALSE,auto.key = TRUE)
- 画Box图
- > boxplot(age~y,data=bank)
- 双样本t分布检验,p值小于0.05时拒绝原假设
- 这里的原假设是两个样本没有相关性
- 得到的结果是p值为1.805e-06,拒绝两个样本没有相关性的假设
- 这里认为两个样本有相关性
- > t.test(age~y,data=bank,alternative="two.sided",var.equal=FALSE)
- Welch Two Sample t-test
- data: age by y
- t = -4.7795, df = 5258.5, p-value = 1.805e-06
- alternative hypothesis: true difference in means is not equal to 0
- 95 percent confidence interval:
- -1.4129336 -0.5909889
- sample estimates:
- mean in group no mean in group yes
- 39.91119 40.91315
- 数据可视化
- 画饼图
- > tab=table(bank$marital)
- > pie(tab)
- 画直方图
- > tab=table(bank$marital)
- > barplot(tab)
- 画下面这个图
- > tab=table(bank$marital,bank$y)
- > plot(tab)
- 画层叠直方图
- > tab=table(bank$marital,bank$y)
- > lattice::barchart(tab,auto.key=TRUE)
- 加载这个包,准备画图
- > library(dplyr)
- > data=group_by(bank,marital,y)
- > data=tally(data)
- !!!!!!!!!!!!!
- > ggplot2::ggplot(data=data,mapping=aes(marital,n))+geom_bar(mapping=aes(fill=y),position="dodge",stat="identity")
- 数据预处理
- 分组之后再画图
- > labels=c('青年','中年','老年')
- > bank$age_group=cut(bank$age,breaks = c(0,35,55,100),right = FALSE,labels = labels)
- > library(ggplot2)
- > ggplot(data=bank,mapping = aes(age_group))+geom_bar(mapping = aes(fill=y),position="dodge",stat="count")
- 衍生变量
- 直接使用$符向原数据框添加新的变量
- > bank$log.cons.price.idx=log(bank$cons.price.idx)
- 使用transform函数向原数据框添加变量
- > bank<-transform(bank,log.cons.price.idx=log(cons.price.idx),log.nr.employed=log(nr.employed))
- 使用dplyr包里的mutate函数增加变量
- > bank<-dplyr::mutate(bank,log.cons.price.idx=log(cons.price.idx))
- 使用dplyr包里的transmute函数只保留新生成的变量
- > bank2<-dplyr::transmute(bank,log.cons.price.idx=log(cons.price.idx),log.nr.employed=log(nr.employed))
- 中心化
- > v=1:10
- > v1=v-mean(v)
- > v2=scale(v,center=TRUE,scale = FALSE)
- 无量纲化
- > V1=v/sqrt(sum(v^2)/(length(v)-1))
- > v2=scale(v,center=FALSE,scale=TRUE)
- 根据最大最小值进行归一化
- > v3=(v-min(v))/(max(v)-min(v))
- 进行标准正态化
- > v1=(v-mean(v))/sd(v)
- > v2=scale(v,center = TRUE,scale=TRUE)
- Box-Cox变换
- 使用car包里的boxCox函数
- > install.packages("car")
- > library(car)
- > boxCox(age~.,data=bank)
- 使用caret包,做Box-Cox变换
- > install.packages("caret")
- > library(caret)
- > dat<-subset(bank,select="age")
- > trans<-preProcess(dat,method=C("BoxCox"))
- 数据预处理下
- 违反常识的异常值
- 基于数据分布的异常值(离群点)识别
- bank.dirty=read.csv("bank-dirty.csv")
- summary(bank.dirty)
- age job marital education
- Min. : 17.00 admin. :10422 divorced: 4612 university.degree :12165
- 1st Qu.: 32.00 blue-collar: 9254 married :24928 high.school : 9515
- Median : 38.00 technician : 6743 single :11568 basic.9y : 6043
- Mean : 40.03 services : 3969 NA's : 80 professional.course: 5242
- 3rd Qu.: 47.00 management : 2924 basic.4y : 4175
- Max. :123.00 (Other) : 7546 (Other) : 2310
- NA's :2 NA's : 330 NA's : 1738
- default housing loan contact month
- no :32588 no :18622 no :33950 cellular :26144 may :13769
- yes : 3 yes :21576 yes : 6248 telephone:15044 jul : 7174
- NA's: 8597 NA's: 990 NA's: 990 aug : 6178
- jun : 5318
- nov : 4101
- apr : 2632
- (Other): 2016
- day_of_week duration campaign pdays previous
- fri:7827 Min. : 0.0 Min. : 1.000 Min. : 0.0 Min. :0.000
- mon:8514 1st Qu.: 102.0 1st Qu.: 1.000 1st Qu.:999.0 1st Qu.:0.000
- thu:8623 Median : 180.0 Median : 2.000 Median :999.0 Median :0.000
- tue:8090 Mean : 258.3 Mean : 2.568 Mean :962.5 Mean :0.173
- wed:8134 3rd Qu.: 319.0 3rd Qu.: 3.000 3rd Qu.:999.0 3rd Qu.:0.000
- Max. :4918.0 Max. :56.000 Max. :999.0 Max. :7.000
- poutcome emp.var.rate cons.price.idx cons.conf.idx
- failure : 4252 Min. :-3.40000 Min. :92.20 Min. :-50.8
- nonexistent:35563 1st Qu.:-1.80000 1st Qu.:93.08 1st Qu.:-42.7
- success : 1373 Median : 1.10000 Median :93.75 Median :-41.8
- Mean : 0.08189 Mean :93.58 Mean :-40.5
- 3rd Qu.: 1.40000 3rd Qu.:93.99 3rd Qu.:-36.4
- Max. : 1.40000 Max. :94.77 Max. :-26.9
- euribor3m nr.employed y
- Min. :0.634 Min. :4964 no :36548
- 1st Qu.:1.344 1st Qu.:5099 yes: 4640
- Median :4.857 Median :5191
- Mean :3.621 Mean :5167
- 3rd Qu.:4.961 3rd Qu.:5228
- Max. :5.045 Max. :5228
- 常识告诉我们,虽然123岁的老人存在,但概率也极低,也不太可能是银行的客户
- 找出在年龄这一列的上离群值和下离群值
- > head(bank.dirty[order(bank.dirty$age,decreasing = TRUE),'age',drop=FALSE],n=5)
- age
- 39494 123
- 38453 98
- 38456 98
- 27827 95
- 38922 94
- > tail(bank.dirty[order(bank.dirty$age,decreasing = TRUE),'age',drop=FALSE],n=5)
- age
- 37559 17
- 37580 17
- 38275 17
- 120 NA
- 156 NA
- 异常值的处理
- 当作缺失值处理
- > bank.dirty$age[which(bank.dirty$age>98)]<-NA
- 删除或者插补
- 重编码
- 职业类型有12个分类,不利于后续分析,把除了unknown以外的分类进行重新编码,简化成4类
- Month有12个分类,把它转化成季度
- Education的分类,除了unknow之外有7类
- 进行重编码
- levels(bank.dirty$job) <- c( "management","services","entrepreneur","entrepreneur",
- "management","unemployed", "entrepreneur","services",
- "unemployed","services","unemployed","unknown" )
- > levels(bank.dirty$month) <- c("Q2","Q3","Q4","Q3","Q2",
- "Q1","Q2","Q4","Q4","Q3")
- >
- > levels(bank.dirty$education) <- c( "primary","primary","primary","secondary",
- "primary","tertiary","tertiary","unknown")
- 缺失值
- 分类较多,分类是unknown,不能给我们提供信息
- 有些模型不能处理缺失值,比如Logistic回归
- 缺失值插补的方法
- 1、 用中位数或众数插补
- > library(imputeMissings)
- > bank.clean<-impute(bank.dirty,object = compute(bank.dirty,method = "median/mode"))
- 2、 最邻近(knn)插补
- library(DMwR)
- bank.clean=knnImputation(bank.dirty,k=5)
- 3、 随机森林插补
- library(missForest)
- Imp = missForest(bank.dirty)
- bank.clean = Imp$ximp
- 缺失值插补的R包
- 1、 imputeMissings包
- 2、 DMwR包
- 用Logistic回归建立客户响应模型
- 1、 广义线性模型
- 广义线性模型擅长于处理因变量不是连续变量的问题
- 1) Y是分类变量
- 2) Y是定序变量
- 3) Y是离散取值
- 2、 当Y取值是0-1二分类变量是,就是Logistic回归
- Logistic回归在R中的实现
- 数据重编码
- bank$y=ifelse(bank$y=='yes',1,0)
- 改成以Q1为参考因子
- bank$month<-relevel(bank$month,ref="Q1")
- 构建Logistic回归模型
- > model<-glm(y~.,data=bank,family = 'binomial')
- > summary(model)
- Call:
- glm(formula = y ~ ., family = "binomial", data = bank)
- Deviance Residuals:
- Min 1Q Median 3Q Max
- -5.9958 -0.3082 -0.1887 -0.1333 3.4283
- Coefficients: (1 not defined because of singularities)
- Estimate Std. Error z value Pr(>|z|)
- (Intercept) -1.957e+02 1.935e+01 -10.116 < 2e-16 ***
- age 1.851e-03 2.415e-03 0.767 0.443289
- jobblue-collar -2.659e-01 7.942e-02 -3.348 0.000814 ***
- jobentrepreneur -2.029e-01 1.248e-01 -1.626 0.103924
- jobhousemaid -3.628e-02 1.475e-01 -0.246 0.805705
- jobmanagement -8.054e-02 8.501e-02 -0.947 0.343423
- jobretired 2.928e-01 1.067e-01 2.743 0.006092 **
- jobself-employed -1.680e-01 1.176e-01 -1.428 0.153332
- jobservices -1.497e-01 8.552e-02 -1.751 0.079969 .
- jobstudent 2.674e-01 1.106e-01 2.416 0.015680 *
- jobtechnician 3.462e-03 7.096e-02 0.049 0.961086
- jobunemployed 8.514e-03 1.273e-01 0.067 0.946686
- jobunknown -8.046e-02 2.390e-01 -0.337 0.736420
- maritalmarried 1.567e-02 6.824e-02 0.230 0.818420
- maritalsingle 6.620e-02 7.791e-02 0.850 0.395473
- maritalunknown 6.303e-02 4.113e-01 0.153 0.878211
- educationbasic.6y 9.647e-02 1.202e-01 0.803 0.422195
- educationbasic.9y -2.154e-02 9.494e-02 -0.227 0.820557
- educationhigh.school 3.381e-02 9.188e-02 0.368 0.712895
- educationilliterate 1.132e+00 7.395e-01 1.531 0.125887
- educationprofessional.course 1.136e-01 1.013e-01 1.121 0.262175
- educationuniversity.degree 2.134e-01 9.188e-02 2.322 0.020211 *
- educationunknown 1.361e-01 1.196e-01 1.138 0.255314
- defaultunknown -3.055e-01 6.712e-02 -4.552 5.32e-06 ***
- defaultyes -7.150e+00 1.135e+02 -0.063 0.949784
- housingunknown -7.385e-02 1.390e-01 -0.531 0.595260
- housingyes -3.740e-03 4.121e-02 -0.091 0.927695
- loanunknown NA NA NA NA
- loanyes -6.362e-02 5.725e-02 -1.111 0.266454
- contacttelephone -6.068e-01 7.124e-02 -8.518 < 2e-16 ***
- monthQ2 -2.192e+00 1.125e-01 -19.479 < 2e-16 ***
- monthQ3 -1.463e+00 1.148e-01 -12.747 < 2e-16 ***
- monthQ4 -1.995e+00 1.240e-01 -16.088 < 2e-16 ***
- day_of_weekmon -1.216e-01 6.588e-02 -1.846 0.064887 .
- day_of_weekthu 6.375e-02 6.382e-02 0.999 0.317842
- day_of_weektue 6.867e-02 6.545e-02 1.049 0.294118
- day_of_weekwed 1.436e-01 6.530e-02 2.199 0.027911 *
- duration 4.667e-03 7.397e-05 63.092 < 2e-16 ***
- campaign -4.543e-02 1.158e-02 -3.922 8.77e-05 ***
- pdays -9.627e-04 2.162e-04 -4.452 8.50e-06 ***
- previous -5.806e-02 5.879e-02 -0.988 0.323369
- poutcomenonexistent 4.507e-01 9.372e-02 4.809 1.51e-06 ***
- poutcomesuccess 9.371e-01 2.106e-01 4.451 8.56e-06 ***
- emp.var.rate -1.389e+00 7.693e-02 -18.057 < 2e-16 ***
- cons.price.idx 1.815e+00 1.193e-01 15.218 < 2e-16 ***
- cons.conf.idx 3.353e-02 6.664e-03 5.033 4.84e-07 ***
- euribor3m 6.054e-02 1.126e-01 0.537 0.590987
- nr.employed 4.937e-03 1.873e-03 2.635 0.008413 **
- ---
- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
- (Dispersion parameter for binomial family taken to be 1)
- Null deviance: 28999 on 41187 degrees of freedom
- Residual deviance: 17199 on 41141 degrees of freedom
- AIC: 17293
- Number of Fisher Scoring iterations: 10
- > exp(coef(model))
- (Intercept) age jobblue-collar
- 9.856544e-86 1.001853e+00 7.665077e-01
- jobentrepreneur jobhousemaid jobmanagement
- 8.163314e-01 9.643733e-01 9.226187e-01
- jobretired jobself-employed jobservices
- 1.340142e+00 8.453874e-01 8.609387e-01
- jobstudent jobtechnician jobunemployed
- 1.306514e+00 1.003468e+00 1.008550e+00
- jobunknown maritalmarried maritalsingle
- 9.226922e-01 1.015789e+00 1.068445e+00
- maritalunknown educationbasic.6y educationbasic.9y
- 1.065061e+00 1.101276e+00 9.786948e-01
- educationhigh.school educationilliterate educationprofessional.course
- 1.034388e+00 3.101297e+00 1.120248e+00
- educationuniversity.degree educationunknown defaultunknown
- 1.237856e+00 1.145744e+00 7.367445e-01
- defaultyes housingunknown housingyes
- 7.851906e-04 9.288126e-01 9.962671e-01
- loanunknown loanyes contacttelephone
- NA 9.383587e-01 5.450980e-01
- monthQ2 monthQ3 monthQ4
- 1.116739e-01 2.314802e-01 1.360620e-01
- day_of_weekmon day_of_weekthu day_of_weektue
- 8.854888e-01 1.065828e+00 1.071082e+00
- day_of_weekwed duration campaign
- 1.154380e+00 1.004678e+00 9.555850e-01
- pdays previous poutcomenonexistent
- 9.990378e-01 9.435960e-01 1.569466e+00
- poutcomesuccess emp.var.rate cons.price.idx
- 2.552531e+00 2.493091e-01 6.140533e+00
- cons.conf.idx euribor3m nr.employed
- 1.034103e+00 1.062408e+00 1.004949e+00
- Job变量的基准水平是management,从上面的结果看,服务业和自主劳动者购买银行产品的几率(odds)是管理岗从业人员的0.88倍,未就业人员购买银行产品的几率是管理岗人员的1.25倍
- > summary(model.step)
- 向前逐步回归
- > model.step=step(model,direction = "backward")
- 向后逐步回归
- > model.step = step(model, direction = "forward")
- 双向逐步回归
- > model.step = step(model, direction = "both")
- > summary(model.step)
- Call:
- glm(formula = y ~ job + education + default + contact + month +
- day_of_week + duration + campaign + pdays + poutcome + emp.var.rate +
- cons.price.idx + cons.conf.idx + nr.employed, family = "binomial",
- data = bank)
- Deviance Residuals:
- Min 1Q Median 3Q Max
- -5.9884 -0.3088 -0.1887 -0.1332 3.4026
- Coefficients:
- Estimate Std. Error z value Pr(>|z|)
- (Intercept) -2.031e+02 1.426e+01 -14.246 < 2e-16 ***
- jobblue-collar -2.700e-01 7.917e-02 -3.411 0.000648 ***
- jobentrepreneur -2.043e-01 1.242e-01 -1.645 0.100003
- jobhousemaid -2.832e-02 1.464e-01 -0.193 0.846590
- jobmanagement -8.368e-02 8.409e-02 -0.995 0.319670
- jobretired 3.234e-01 9.130e-02 3.542 0.000397 ***
- jobself-employed -1.670e-01 1.176e-01 -1.421 0.155435
- jobservices -1.528e-01 8.545e-02 -1.789 0.073666 .
- jobstudent 2.682e-01 1.046e-01 2.565 0.010316 *
- jobtechnician 4.389e-03 7.093e-02 0.062 0.950665
- jobunemployed 8.975e-03 1.271e-01 0.071 0.943715
- jobunknown -6.363e-02 2.378e-01 -0.268 0.789057
- educationbasic.6y 8.993e-02 1.196e-01 0.752 0.452024
- educationbasic.9y -2.716e-02 9.416e-02 -0.288 0.772992
- educationhigh.school 2.890e-02 9.053e-02 0.319 0.749573
- educationilliterate 1.118e+00 7.398e-01 1.511 0.130744
- educationprofessional.course 1.084e-01 1.004e-01 1.079 0.280686
- educationuniversity.degree 2.103e-01 9.017e-02 2.332 0.019678 *
- educationunknown 1.363e-01 1.195e-01 1.140 0.254110
- defaultunknown -3.017e-01 6.666e-02 -4.526 6.02e-06 ***
- defaultyes -7.141e+00 1.135e+02 -0.063 0.949831
- contacttelephone -6.011e-01 7.069e-02 -8.504 < 2e-16 ***
- monthQ2 -2.210e+00 1.108e-01 -19.939 < 2e-16 ***
- monthQ3 -1.475e+00 1.146e-01 -12.869 < 2e-16 ***
- monthQ4 -1.982e+00 1.183e-01 -16.755 < 2e-16 ***
- day_of_weekmon -1.210e-01 6.584e-02 -1.837 0.066174 .
- day_of_weekthu 6.208e-02 6.374e-02 0.974 0.330066
- day_of_weektue 6.851e-02 6.538e-02 1.048 0.294651
- day_of_weekwed 1.420e-01 6.525e-02 2.176 0.029592 *
- duration 4.667e-03 7.396e-05 63.099 < 2e-16 ***
- campaign -4.587e-02 1.158e-02 -3.960 7.49e-05 ***
- pdays -8.822e-04 2.024e-04 -4.358 1.31e-05 ***
- poutcomenonexistent 5.219e-01 6.356e-02 8.211 < 2e-16 ***
- poutcomesuccess 9.996e-01 2.028e-01 4.928 8.31e-07 ***
- emp.var.rate -1.376e+00 6.885e-02 -19.980 < 2e-16 ***
- cons.price.idx 1.845e+00 1.041e-01 17.725 < 2e-16 ***
- cons.conf.idx 3.622e-02 4.853e-03 7.464 8.42e-14 ***
- nr.employed 5.883e-03 9.765e-04 6.024 1.70e-09 ***
- ---
- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
- (Dispersion parameter for binomial family taken to be 1)
- Null deviance: 28999 on 41187 degrees of freedom
- Residual deviance: 17203 on 41150 degrees of freedom
- AIC: 17279
- Number of Fisher Scoring iterations: 10
- 模型预测
- 用predict函数,参数type=’response’
- Newdata参数是要预测的数据集
- > prob<-predict(model.step,type = 'response')
- > head(prob)
- 1 2 3 4 5 6
- 0.015029328 0.006044212 0.011640349 0.010173952 0.016897254 0.007174804
- 假设以0.5为临界值
- > pre<-ifelse(prob>0.5,1,0)
- > table(pre,bank$y)
- pre 0 1
- 0 35596 2667
- 1 952 1973
- >
- 预测的准确率
- > (35592+1964)/(35592+2676+956+1964)
- [1] 0.911819
- 实际有响应的客户被识别出了多少
- > 1964/(1964+2676)
- [1] 0.4232759
- 模型评估
- > confusionMatrix(bank$y,pre,pos='')
- Confusion Matrix and Statistics
- Reference
- Prediction 0 1
- 0 35596 952
- 1 2667 1973
- Accuracy : 0.9121
- 95% CI : (0.9094, 0.9149)
- No Information Rate : 0.929
- P-Value [Acc > NIR] : 1
- Kappa : 0.476
- Mcnemar's Test P-Value : <2e-16
- Sensitivity : 0.67453
- Specificity : 0.93030
- Pos Pred Value : 0.42522
- Neg Pred Value : 0.97395
- Prevalence : 0.07102
- Detection Rate : 0.04790
- Detection Prevalence : 0.11265
- Balanced Accuracy : 0.80241
- 'Positive' Class : 1
- Kappa 统计量(kappa statistic)
- 用于评判分类器的分类结果与随机分类的差异度
- 用Kappa统计量评价:
- 较差:小于0.20
- 一般:0.20至0.40
- 稳健:0.40至0.60
- 好的:0.60至0.80
- 很好的:0.80至1.00
- ROC曲线
- pred<-prediction(prob,bank$y)
- perf<-performance(pred,measure = "tpr",x="fpr")
- plot(perf)
- RandomForest
- 加载数据列
- > data=read.table("input.txt",header = TRUE)
- > str(data)
- 'data.frame': 222 obs. of 23 variables:
- $ Acti_Profile : num 0 0 0 0 0 0 0 0 0 0 ...
- $ Activity : num 1.25 0 0.938 6.562 0 ...
- $ Diastolic_PTT : num 256 240 253 0 241 ...
- $ Diastolic : num 73.2 78.6 74 0 78.4 ...
- $ Heart_Rate_Curve : num 81.2 69.7 77.6 95 83.6 ...
- $ Heart_Rate_Variability_HF: num 131 250 135 144 141 ...
- $ Heart_Rate_Variability_LF: num 311 218 203 301 244 ...
- $ MAP : num 86 93.5 86.9 0 91.7 ...
- $ Position : num 0 0 0 1 0 0 0 0 0 0 ...
- $ PTT_Raw : num 308 288 308 0 295 ...
- $ RR_Interval : num 734 878 773 632 714 ...
- $ Sleep_Wake : num 1 1 1 1 1 0 1 1 0 0 ...
- $ SpO2 : num 0 0 99 0 98.4 ...
- $ Sympatho_Vagal_Balance : num 23 8.17 14.5 20.4 16.88 ...
- $ Systolic_PTT : num 308 288 307 0 295 ...
- $ Systolic : num 113 124 113 0 119 ...
- $ Autonomic_arousals : num 0 0 0 0 0 0 0 0 0 0 ...
- $ Cardio_complex : num 0 0 0 1 0 0 0 0 0 0 ...
- $ Cardio_rhythm : num 0 0 2 0 0 0 0 0 0 0 ...
- $ Classification_Arousal : num 0 0 0 0 0 0 0 0 0 0 ...
- $ PTT_Events : num 1 0 2 0 0 0 0 0 0 0 ...
- $ Systolic_Events : num 1 0 1 0 0 0 0 0 0 0 ...
- $ y : num 1 0 1 0 0 0 0 0 0 0 ...
- 加载随机森林包
- > library(randomForest)
- 进行训练 以y作为因变量,其余数据作为自变量
- > rf <- randomForest(y ~ ., data=data, ntree=100, proximity=TRUE,importance=TRUE)
- > plot(rf)
- 重要性检测
- 衡量把一个变量的取值变为随机数,随机森林预测准确性的降低程度
- > importance(rf,type=1)
- %IncMSE
- Acti_Profile 0.00000000
- Activity 0.99353251
- Diastolic_PTT 0.32193611
- Diastolic 1.99891809
- Heart_Rate_Curve 0.92001352
- Heart_Rate_Variability_HF 2.07870722
- Heart_Rate_Variability_LF -0.24957163
- MAP 0.48142975
- Position 1.86876751
- PTT_Raw 1.94648914
- RR_Interval 0.60557964
- Sleep_Wake 1.00503782
- SpO2 0.25396165
- Sympatho_Vagal_Balance 1.42906765
- Systolic_PTT 1.27965813
- Systolic 0.77382673
- Autonomic_arousals 0.00000000
- Cardio_complex 1.00503782
- Cardio_rhythm 1.14283152
- Classification_Arousal -0.04383997
- PTT_Events 4.63980680
- Systolic_Events 33.29461169
- 输出随机森林的模型
- > print(rf)
- Call:
- randomForest(formula = y ~ ., data = data, ntree = 100, proximity = TRUE, importance = TRUE)
- Type of random forest: regression
- Number of trees: 100
- No. of variables tried at each split: 7
- Mean of squared residuals: 0.003226897 残差平方和SSE
- % Var explained: 98.7
- >
- 总平方和(SST):(样本数据-样本均值)的平方和
- 回归平方和(SSR):(预测数据-样本均值)的平方和
- 残差平方和(SSE):(样本数据-预测数据均值)的平方和
- SST = SSR + SSE
- 基尼指数:
- > importance(rf,type=2)
- IncNodePurity
- Acti_Profile 0.000000000
- Activity 0.445181480
- Diastolic_PTT 0.452221870
- Diastolic 0.449372186
- Heart_Rate_Curve 0.473113852
- Heart_Rate_Variability_HF 0.226815300
- Heart_Rate_Variability_LF 0.205457353
- MAP 0.536977574
- Position 0.307333210
- PTT_Raw 0.656726800
- RR_Interval 0.452738011
- Sleep_Wake 0.014423077
- SpO2 1.793361279
- Sympatho_Vagal_Balance 0.352759689
- Systolic_PTT 0.851951505
- Systolic 0.823955781
- Autonomic_arousals 0.000000000
- Cardio_complex 0.008047619
- Cardio_rhythm 0.141907084
- Classification_Arousal 0.085739429
- PTT_Events 7.468690820
- Systolic_Events 39.000163018
- >
- 进行预测
- prediction <- predict(rf, data[,],type="response")
- 输出预测结果
- table(observed =data$y,predicted=prediction)
- plot(prediction)
- 支持向量机
- library(e1071)
- svmfit<-svm(y~.,data=data,kernel="linear",cost=10,scale=FALSE)
- > print(svmfit)
- Call:
- svm(formula = y ~ ., data = data, kernel = "linear", cost = 10, scale = FALSE)
- Parameters:
- SVM-Type: eps-regression
- SVM-Kernel: linear
- cost: 10
- gamma: 0.04545455
- epsilon: 0.1
- Number of Support Vectors: 20
- > plot(svmfit,data)
- 神经网络
- > concrete<-read_excel("Concrete_Data.xls")
- > str(concrete)
- Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1030 obs. of 9 variables:
- $ Cement : num 540 540 332 332 199 ...
- $ Slag : num 0 0 142 142 132 ...
- $ Ash : num 0 0 0 0 0 0 0 0 0 0 ...
- $ water : num 162 162 228 228 192 228 228 228 228 228 ...
- $ superplastic: num 2.5 2.5 0 0 0 0 0 0 0 0 ...
- $ coarseagg : num 1040 1055 932 932 978 ...
- $ fineagg : num 676 676 594 594 826 ...
- $ age : num 28 28 270 365 360 90 365 28 28 28 ...
- $ strength : num 80 61.9 40.3 41.1 44.3 ...
- > normalize <- function(x){ return ((x-min(x))/(max(x)-min(x)))}
- > concrete_norm <- as.data.frame(lapply(concrete,normalize))
- > concrete_train <- concrete_norm[1:773,]
- > concrete_test <- concrete_norm[774:1030,]
- > library(neuralnet)
- > concrete_model <- neuralnet(strength ~ Cement+Slag+Ash+water+superplastic+coarseagg+fineagg+age,data=concrete_train)
- > plot(concrete_model)
- model_results <- compute(concrete_model,concrete_test[1:8])
- predicted_strength <- model_results$net.result
- > cor(predicted_strength,concrete_test$strength)
- [,1]
- [1,] 0.7205120076
- > concrete_model2 <- neuralnet(strength ~ Cement+Slag+Ash+water+superplastic+coarseagg+fineagg+age,data=concrete_train,hidden=5)
- > plot(concrete_model2)
- 计算误差
- > model_results2 <- compute(concrete_model2,concrete_test[1:8])
- > predicted_strength2 <- model_results2$net.result
- > cor(predicted_strength2,concrete_test$strength)
- [,1]
- [1,] 0.6727155609
- >
- 主成分分析
- 身高、体重、胸围、坐高
- > test<-data.frame(
- + X1=c(148, 139, 160, 149, 159, 142, 153, 150, 151, 139,
- + 140, 161, 158, 140, 137, 152, 149, 145, 160, 156,
- + 151, 147, 157, 147, 157, 151, 144, 141, 139, 148),
- + X2=c(41, 34, 49, 36, 45, 31, 43, 43, 42, 31,
- + 29, 47, 49, 33, 31, 35, 47, 35, 47, 44,
- + 42, 38, 39, 30, 48, 36, 36, 30, 32, 38),
- + X3=c(72, 71, 77, 67, 80, 66, 76, 77, 77, 68,
- + 64, 78, 78, 67, 66, 73, 82, 70, 74, 78,
- + 73, 73, 68, 65, 80, 74, 68, 67, 68, 70),
- + X4=c(78, 76, 86, 79, 86, 76, 83, 79, 80, 74,
- + 74, 84, 83, 77, 73, 79, 79, 77, 87, 85,
- + 82, 78, 80, 75, 88, 80, 76, 76, 73, 78)
- + )
- > test.pr<-princomp(test,cor=TRUE)
- > summary(test.pr,loadings=TRUE)
- Importance of components:
- Comp.1 Comp.2 Comp.3 Comp.4
- Standard deviation 1.8817805390 0.55980635717 0.28179594325 0.25711843909
- Proportion of Variance 0.8852744993 0.07834578938 0.01985223841 0.01652747293
- Cumulative Proportion 0.8852744993 0.96362028866 0.98347252707 1.00000000000
- Loadings:
- Comp.1 Comp.2 Comp.3 Comp.4
- X1 0.497 0.543 -0.450 0.506
- X2 0.515 -0.210 -0.462 -0.691
- X3 0.481 -0.725 0.175 0.461
- X4 0.507 0.368 0.744 -0.232
- 前两个主成分的累计贡献率已经达到96% 可以舍去另外两个主成分 达到降维的目的
- 因此可以得到函数表达式 Z1=-0.497X'1-0.515X'2-0.481X'3-0.507X'4
- Z2= 0.543X'1-0.210X'2-0.725X'3-0.368X'4
- 4.画主成分的碎石图并预测
- > screeplot(test.pr,type="lines")
- > p<-predict(test.pr)
- > p
- Comp.1 Comp.2 Comp.3 Comp.4
- [1,] -0.06990949737 -0.23813701272 -0.35509247634 -0.266120139417
- [2,] -1.59526339772 -0.71847399061 0.32813232022 -0.118056645885
- [3,] 2.84793151061 0.38956678680 -0.09731731272 -0.279482487139
- [4,] -0.75996988424 0.80604334819 -0.04945721875 -0.162949297761
- [5,] 2.73966776853 0.01718087263 0.36012614873 0.358653043787
- [6,] -2.10583167924 0.32284393414 0.18600422367 -0.036456083707
- [7,] 1.42105591247 -0.06053164925 0.21093320662 -0.044223092351
- [8,] 0.82583976981 -0.78102575640 -0.27557797533 0.057288571933
- [9,] 0.93464401954 -0.58469241699 -0.08814135786 0.181037745585
- [10,] -2.36463819933 -0.36532199291 0.08840476284 0.045520127461
- [11,] -2.83741916086 0.34875841111 0.03310422938 -0.031146930047
- [12,] 2.60851223537 0.21278727930 -0.33398036623 0.210157574387
- [13,] 2.44253342081 -0.16769495893 -0.46918095412 -0.162987829937
- [14,] -1.86630668724 0.05021383642 0.37720280364 -0.358821916178
- [15,] -2.81347420580 -0.31790107093 -0.03291329149 -0.222035112399
- [16,] -0.06392982655 0.20718447599 0.04334339948 0.703533623798
- [17,] 1.55561022242 -1.70439673831 -0.33126406220 0.007551878960
- [18,] -1.07392250663 -0.06763418320 0.02283648409 0.048606680158
- [19,] 2.52174211878 0.97274300950 0.12164633439 -0.390667990681
- [20,] 2.14072377494 0.02217881219 0.37410972458 0.129548959692
- [21,] 0.79624421805 0.16307887263 0.12781269571 -0.294140762463
- [22,] -0.28708320594 -0.35744666106 -0.03962115883 0.080991988802
- [23,] 0.25151075072 1.25555187663 -0.55617324819 0.109068938725
- [24,] -2.05706031616 0.78894493512 -0.26552109297 0.388088642937
- [25,] 3.08596854773 -0.05775318018 0.62110421208 -0.218939612456
- [26,] 0.16367554630 0.04317931667 0.24481850312 0.560248997030
- [27,] -1.37265052598 0.02220972121 -0.23378320040 -0.257399715466
- [28,] -2.16097778154 0.13733232981 0.35589738735 0.093123683044
- [29,] -2.40434826507 -0.48613137190 -0.16154440788 -0.007914021222
- [30,] -0.50287467640 0.14734316507 -0.20590831261 -0.122078819188
- >
加载数据
>
w<-read.table("test.prn",header = T)
> w
X.. X...1
1 A
2
2 B
3
3 C
5
4 D
5
> library(readxl)
>
dat<-read_excel("test.xlsx")
> dat
# A tibble: 4 x 2
`商品` `价格`
<chr>
<dbl>
1 A
2
2 B
3
3 C
5
4 D
5
>
bank=read.table("bank-full.csv",header = TRUE,sep=",")
查看数据结构
> str(bank)
'data.frame': 41188 obs. of
21 variables:
$ age
: int 56 57 37 40 56 45 59 41 24
25 ...
$ job
: Factor w/ 12 levels "admin.","blue-collar",..: 4 8
8 1 8 8 1 2 10 8 ...
$ marital
: Factor w/ 4 levels "divorced","married",..: 2 2 2
2 2 2 2 2 3 3 ...
$ education
: Factor w/ 8 levels "basic.4y","basic.6y",..: 1 4 4
2 4 3 6 8 6 4 ...
$ default
: Factor w/ 3 levels "no","unknown",..: 1 2 1 1 1 2
1 2 1 1 ...
$ housing
: Factor w/ 3 levels "no","unknown",..: 1 1 3 1 1 1
1 1 3 3 ...
$ loan
: Factor w/ 3 levels "no","unknown",..: 1 1 1 1 3 1
1 1 1 1 ...
$ contact
: Factor w/ 2 levels "cellular","telephone": 2 2 2 2
2 2 2 2 2 2 ...
$ month
: Factor w/ 10 levels
"apr","aug","dec",..: 7 7 7 7 7 7 7 7 7 7 ...
$ day_of_week
: Factor w/ 5 levels "fri","mon","thu",..:
2 2 2 2 2 2 2 2 2 2 ...
$ duration
: int 261 149 226 151 307 198 139
217 380 50 ...
$ campaign
: int 1 1 1 1 1 1 1 1 1 1 ...
$ pdays
: int 999 999 999 999 999 999 999
999 999 999 ...
$ previous
: int 0 0 0 0 0 0 0 0 0 0 ...
$ poutcome
: Factor w/ 3 levels "failure","nonexistent",..: 2 2
2 2 2 2 2 2 2 2 ...
$ emp.var.rate
: num 1.1 1.1 1.1 1.1 1.1 1.1 1.1
1.1 1.1 1.1 ...
$ cons.price.idx: num 94 94 94 94 94 ...
$ cons.conf.idx : num -36.4 -36.4 -36.4 -36.4 -36.4 -36.4 -36.4
-36.4 -36.4 -36.4 ...
$ euribor3m
: num 4.86 4.86 4.86 4.86 4.86
...
$ nr.employed
: num 5191 5191 5191 5191 5191
...
$ y
: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1
...
查看数据的最小值,最大值,中位数,平均数,分位数
> summary(bank)
age job marital
Min.
:17.00 admin. :10422
divorced: 4612
1st Qu.:32.00
blue-collar: 9254 married
:24928
Median :38.00
technician : 6743 single :11568
Mean
:40.02 services : 3969
unknown : 80
3rd Qu.:47.00
management : 2924
Max.
:98.00 retired : 1720
(Other) : 6156
education default housing
university.degree :12168
no :32588 no
:18622
high.school : 9515
unknown: 8597 unknown: 990
basic.9y : 6045 yes
: 3 yes
:21576
professional.course: 5243
basic.4y : 4176
basic.6y : 2292
(Other) : 1749
loan contact month day_of_week
no
:33950 cellular :26144 may
:13769 fri:7827
unknown:
990 telephone:15044 jul
: 7174 mon:8514
yes :
6248 aug : 6178
thu:8623
jun : 5318
tue:8090
nov : 4101
wed:8134
apr : 2632
(Other):
2016
duration campaign pdays
Min.
: 0.0 Min.
: 1.000 Min. :
0.0
1st Qu.: 102.0 1st Qu.: 1.000 1st Qu.:999.0
Median : 180.0 Median : 2.000 Median :999.0
Mean :
258.3 Mean : 2.568
Mean :962.5
3rd Qu.: 319.0 3rd Qu.: 3.000 3rd Qu.:999.0
Max.
:4918.0 Max. :56.000
Max. :999.0
previous poutcome emp.var.rate
Min.
:0.000 failure : 4252
Min. :-3.40000
1st Qu.:0.000
nonexistent:35563 1st
Qu.:-1.80000
Median :0.000
success : 1373 Median : 1.10000
Mean
:0.173 Mean : 0.08189
3rd Qu.:0.000 3rd Qu.: 1.40000
Max.
:7.000
Max. : 1.40000
cons.price.idx
cons.conf.idx euribor3m
Min.
:92.20 Min. :-50.8
Min. :0.634
1st Qu.:93.08
1st Qu.:-42.7 1st Qu.:1.344
Median :93.75
Median :-41.8 Median :4.857
Mean
:93.58 Mean :-40.5
Mean :3.621
3rd Qu.:93.99
3rd Qu.:-36.4 3rd Qu.:4.961
Max.
:94.77 Max. :-26.9
Max. :5.045
nr.employed y
Min.
:4964 no :36548
1st Qu.:5099
yes: 4640
Median :5191
Mean
:5167
3rd Qu.:5228
Max.
:5228
> psych::describe(bank)
方差 个数 平均值 标准差 均值 去掉最大 中位数 最小值 最大值 极差 偏差 峰度
绝对偏差
最小值
之后
的平均数
vars n
mean sd median trimmed mad min
max range skew kurtosis
age 1 41188 40.02
10.42 38.00 39.30
10.38 17.00 98.00
81.00 0.78 0.79
job* 2 41188 4.72
3.59 3.00 4.48
2.97 1.00 12.00
11.00 0.45 -1.39
marital* 3 41188 2.17
0.61 2.00 2.21
0.00 1.00 4.00
3.00 -0.06 -0.34
education* 4 41188 4.75
2.14 4.00 4.88
2.97 1.00 8.00
7.00 -0.24 -1.21
default* 5 41188 1.21
0.41 1.00 1.14 0.00
1.00 3.00 2.00
1.44 0.07
housing* 6 41188 2.07
0.99 3.00 2.09
0.00 1.00 3.00
2.00 -0.14 -1.95
loan* 7 41188 1.33
0.72 1.00 1.16
0.00 1.00 3.00
2.00 1.82 1.38
contact* 8 41188 1.37
0.48 1.00 1.33
0.00 1.00 2.00
1.00 0.56 -1.69
month* 9 41188 5.23
2.32 5.00 5.31
2.97 1.00 10.00
9.00 -0.31 -1.03
day_of_week* 10 41188
3.00 1.40 3.00
3.01 1.48 1.00
5.00 4.00 0.01
-1.27
duration 11 41188 258.29 259.28
180.00 210.61 139.36 0.00 4918.00 4918.00 3.26
20.24
campaign 12 41188 2.57
2.77 2.00 1.99
1.48 1.00 56.00
55.00 4.76 36.97
pdays 13 41188 962.48 186.91
999.00 999.00 0.00
0.00 999.00 999.00 -4.92 22.23
previous 14 41188 0.17
0.49 0.00 0.05
0.00 0.00 7.00
7.00 3.83 20.11
poutcome* 15 41188 1.93
0.36 2.00 2.00
0.00 1.00 3.00
2.00 -0.88 3.98
emp.var.rate 16 41188
0.08 1.57 1.10
0.27 0.44 -3.40
1.40 4.80 -0.72 -1.06
cons.price.idx 17 41188
93.58 0.58 93.75
93.58 0.56 92.20
94.77 2.57 -0.23 -0.83
cons.conf.idx 18 41188
-40.50 4.63 -41.80
-40.60 6.52 -50.80
-26.90 23.90 0.30
-0.36
euribor3m 19 41188 3.62
1.73 4.86 3.81
0.16 0.63 5.04
4.41 -0.71 -1.41
nr.employed 20 41188 5167.04 72.25 5191.00 5178.43 55.00 4963.60 5228.10 264.50 -1.04 0.00
y* 21 41188 1.11
0.32 1.00 1.02
0.00 1.00 2.00
1.00 2.45 4.00
se
age 0.05
job* 0.02
marital* 0.00
education* 0.01
default* 0.00
housing* 0.00
loan* 0.00
contact* 0.00
month* 0.01
day_of_week* 0.01
duration 1.28
campaign 0.01
pdays 0.92
previous 0.00
poutcome* 0.00
emp.var.rate 0.01
cons.price.idx 0.00
cons.conf.idx 0.02
euribor3m 0.01
nr.employed 0.36
y* 0.00
查看数据是否有缺失值
> sapply(bank,anyNA)
age job marital education
FALSE FALSE FALSE FALSE
default housing loan contact
FALSE FALSE FALSE FALSE
month day_of_week duration campaign
FALSE FALSE FALSE FALSE
pdays previous
poutcome emp.var.rate
FALSE FALSE FALSE FALSE
cons.price.idx cons.conf.idx euribor3m nr.employed
FALSE FALSE FALSE FALSE
y
FALSE
成功与不成功的个数
> table(bank$y)
no
yes
36548 4640
在是否结婚这个属性的取值与
是否成功的数量比较
> table(bank$y,bank$marital)
divorced married single unknown
no
4136 22396 9948
68
yes
476 2532 1620
12
> xtabs(~y+marital,data=bank)
marital
y divorced married single unknown
no
4136 22396 9948
68
yes
476 2532 1620
12
>
tab=table(bank$y,bank$marital)
> tab
divorced married single unknown
no
4136 22396 9948
68
yes
476 2532 1620
12
在是否结婚这个属性上的取值
> margin.table(tab,2)
divorced married
single unknown
4612
24928 11568 80
> margin.table(tab,1)
no
yes
36548 4640
在是否结婚这个属性上横向看概率
> prop.table(tab,1)
divorced married single
unknown
no
0.113166247 0.612783189 0.272189997 0.001860567
yes 0.102586207 0.545689655 0.349137931
0.002586207
在是否结婚这个属性上纵向看概率
> prop.table(tab,2)
divorced married
single unknown
no
0.8967910 0.8984275 0.8599585 0.8500000
yes 0.1032090 0.1015725 0.1400415 0.1500000
平的列联表
以第一列和第二列,展开分类group by 1,2
以col.vars 的取值进行次数统计
>
ftable(bank[,c(3,4,21)],row.vars = 1:2,col.vars = "y")
y no
yes
marital education
divorced basic.4y 406 83
basic.6y 169 13
basic.9y 534 31
high.school 1086
107
illiterate 1 1
professional.course 596
61
university.degree 1177
160
unknown 167 20
married basic.4y 2915 313
basic.6y 1628 139
basic.9y 3858 298
high.school 4683
475
illiterate 12 3
professional.course 2799
357
university.degree 5573
821
unknown 928 126
single basic.4y 422 31
basic.6y 301 36
basic.9y 1174 142
high.school 2702
448
illiterate 1 0
professional.course 1247
177
university.degree 3723
683
unknown 378 103
unknown basic.4y 5 1
basic.6y 6 0
basic.9y 6 2
high.school 13 1
illiterate 0 0
professional.course 6
0
university.degree 25
6
unknown 7 2
卡方检验,在p值小于2.2e-16时,拒绝原假设,认为数据不服从卡方分布
> chisq.test(tab)
Pearson's Chi-squared test
data: tab
X-squared = 122.66, df = 3,
p-value < 2.2e-16
画直方图
> hist(bank$age)
> library(lattice)
画连续变量的分布,就是把直方图的中位数连接起来
以年龄为横轴,y为纵轴,数据是bank,画图,auto.key是否有图例
> densityplot(~age,groups =
y,data=bank,plot.point=FALSE,auto.key = TRUE)
画Box图
> boxplot(age~y,data=bank)
双样本t分布检验,p值小于0.05时拒绝原假设
这里的原假设是两个样本没有相关性
得到的结果是p值为1.805e-06,拒绝两个样本没有相关性的假设
这里认为两个样本有相关性
>
t.test(age~y,data=bank,alternative="two.sided",var.equal=FALSE)
Welch Two Sample t-test
data: age by y
t = -4.7795, df = 5258.5,
p-value = 1.805e-06
alternative hypothesis: true
difference in means is not equal to 0
95 percent confidence interval:
-1.4129336 -0.5909889
sample estimates:
mean in group no mean in group yes
39.91119 40.91315
数据可视化
画饼图
> tab=table(bank$marital)
> pie(tab)
画直方图
> tab=table(bank$marital)
> barplot(tab)
画下面这个图
> tab=table(bank$marital,bank$y)
> plot(tab)
画层叠直方图
>
tab=table(bank$marital,bank$y)
>
lattice::barchart(tab,auto.key=TRUE)
加载这个包,准备画图
> library(dplyr)
>
data=group_by(bank,marital,y)
> data=tally(data)
!!!!!!!!!!!!!
> ggplot2::ggplot(data=data,mapping=aes(marital,n))+geom_bar(mapping=aes(fill=y),position="dodge",stat="identity")
数据预处理
分组之后再画图
> labels=c('青年','中年','老年')
> bank$age_group=cut(bank$age,breaks = c(0,35,55,100),right = FALSE,labels = labels)
> library(ggplot2)
> ggplot(data=bank,mapping = aes(age_group))+geom_bar(mapping = aes(fill=y),position="dodge",stat="count")
衍生变量
直接使用$符向原数据框添加新的变量
> bank$log.cons.price.idx=log(bank$cons.price.idx)
使用transform函数向原数据框添加变量
> bank<-transform(bank,log.cons.price.idx=log(cons.price.idx),log.nr.employed=log(nr.employed))
使用dplyr包里的mutate函数增加变量
> bank<-dplyr::mutate(bank,log.cons.price.idx=log(cons.price.idx))
使用dplyr包里的transmute函数只保留新生成的变量
> bank2<-dplyr::transmute(bank,log.cons.price.idx=log(cons.price.idx),log.nr.employed=log(nr.employed))
中心化
> v=1:10
> v1=v-mean(v)
> v2=scale(v,center=TRUE,scale = FALSE)
无量纲化
> V1=v/sqrt(sum(v^2)/(length(v)-1))
> v2=scale(v,center=FALSE,scale=TRUE)
根据最大最小值进行归一化
> v3=(v-min(v))/(max(v)-min(v))
进行标准正态化
> v1=(v-mean(v))/sd(v)
> v2=scale(v,center = TRUE,scale=TRUE)
Box-Cox变换
使用car包里的boxCox函数
> install.packages("car")
> library(car)
> boxCox(age~.,data=bank)
使用caret包,做Box-Cox变换
> install.packages("caret")
> library(caret)
> dat<-subset(bank,select="age")
> trans<-preProcess(dat,method=C("BoxCox"))
数据预处理下
违反常识的异常值
基于数据分布的异常值(离群点)识别
bank.dirty=read.csv("bank-dirty.csv")
summary(bank.dirty)
age job marital education
Min. : 17.00 admin. :10422 divorced: 4612 university.degree :12165
1st Qu.: 32.00 blue-collar: 9254 married :24928 high.school : 9515
Median : 38.00 technician : 6743 single :11568 basic.9y : 6043
Mean : 40.03 services : 3969 NA's : 80 professional.course: 5242
3rd Qu.: 47.00 management : 2924 basic.4y : 4175
Max. :123.00 (Other) : 7546 (Other) : 2310
NA's :2 NA's : 330 NA's : 1738
default housing loan contact month
no :32588 no :18622 no :33950 cellular :26144 may :13769
yes : 3 yes :21576 yes : 6248 telephone:15044 jul : 7174
NA's: 8597 NA's: 990 NA's: 990 aug : 6178
jun : 5318
nov : 4101
apr : 2632
(Other): 2016
day_of_week duration campaign pdays previous
fri:7827 Min. : 0.0 Min. : 1.000 Min. : 0.0 Min. :0.000
mon:8514 1st Qu.: 102.0 1st Qu.: 1.000 1st Qu.:999.0 1st Qu.:0.000
thu:8623 Median : 180.0 Median : 2.000 Median :999.0 Median :0.000
tue:8090 Mean : 258.3 Mean : 2.568 Mean :962.5 Mean :0.173
wed:8134 3rd Qu.: 319.0 3rd Qu.: 3.000 3rd Qu.:999.0 3rd Qu.:0.000
Max. :4918.0 Max. :56.000 Max. :999.0 Max. :7.000
poutcome emp.var.rate cons.price.idx cons.conf.idx
failure : 4252 Min. :-3.40000 Min. :92.20 Min. :-50.8
nonexistent:35563 1st Qu.:-1.80000 1st Qu.:93.08 1st Qu.:-42.7
success : 1373 Median : 1.10000 Median :93.75 Median :-41.8
Mean : 0.08189 Mean :93.58 Mean :-40.5
3rd Qu.: 1.40000 3rd Qu.:93.99 3rd Qu.:-36.4
Max. : 1.40000 Max. :94.77 Max. :-26.9
euribor3m nr.employed y
Min. :0.634 Min. :4964 no :36548
1st Qu.:1.344 1st Qu.:5099 yes: 4640
Median :4.857 Median :5191
Mean :3.621 Mean :5167
3rd Qu.:4.961 3rd Qu.:5228
Max. :5.045 Max. :5228
常识告诉我们,虽然123岁的老人存在,但概率也极低,也不太可能是银行的客户
找出在年龄这一列的上离群值和下离群值
> head(bank.dirty[order(bank.dirty$age,decreasing = TRUE),'age',drop=FALSE],n=5) age 39494 123 38453 98 38456 98 27827 95 38922 94 > tail(bank.dirty[order(bank.dirty$age,decreasing = TRUE),'age',drop=FALSE],n=5) age 37559 17 37580 17 38275 17 120 NA 156 NA |
|||||||||||||
|
模型评估
> confusionMatrix(bank$y,pre,pos='1')
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 35596 952
1 2667 1973
Accuracy : 0.9121
95% CI : (0.9094, 0.9149)
No Information Rate : 0.929
P-Value [Acc > NIR] : 1
Kappa : 0.476
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.67453
Specificity : 0.93030
Pos Pred Value : 0.42522
Neg Pred Value : 0.97395
Prevalence : 0.07102
Detection Rate : 0.04790
Detection Prevalence : 0.11265
Balanced Accuracy : 0.80241
'Positive' Class : 1
Kappa 统计量(kappa statistic)
用于评判分类器的分类结果与随机分类的差异度
用Kappa统计量评价:
较差:小于0.20
一般:0.20至0.40
稳健:0.40至0.60
好的:0.60至0.80
很好的:0.80至1.00
ROC曲线
pred<-prediction(prob,bank$y)
perf<-performance(pred,measure = "tpr",x="fpr")
plot(perf)
RandomForest
加载数据列
> data=read.table("input.txt",header = TRUE)
> str(data)
'data.frame': 222 obs. of 23 variables:
$ Acti_Profile : num 0 0 0 0 0 0 0 0 0 0 ...
$ Activity : num 1.25 0 0.938 6.562 0 ...
$ Diastolic_PTT : num 256 240 253 0 241 ...
$ Diastolic : num 73.2 78.6 74 0 78.4 ...
$ Heart_Rate_Curve : num 81.2 69.7 77.6 95 83.6 ...
$ Heart_Rate_Variability_HF: num 131 250 135 144 141 ...
$ Heart_Rate_Variability_LF: num 311 218 203 301 244 ...
$ MAP : num 86 93.5 86.9 0 91.7 ...
$ Position : num 0 0 0 1 0 0 0 0 0 0 ...
$ PTT_Raw : num 308 288 308 0 295 ...
$ RR_Interval : num 734 878 773 632 714 ...
$ Sleep_Wake : num 1 1 1 1 1 0 1 1 0 0 ...
$ SpO2 : num 0 0 99 0 98.4 ...
$ Sympatho_Vagal_Balance : num 23 8.17 14.5 20.4 16.88 ...
$ Systolic_PTT : num 308 288 307 0 295 ...
$ Systolic : num 113 124 113 0 119 ...
$ Autonomic_arousals : num 0 0 0 0 0 0 0 0 0 0 ...
$ Cardio_complex : num 0 0 0 1 0 0 0 0 0 0 ...
$ Cardio_rhythm : num 0 0 2 0 0 0 0 0 0 0 ...
$ Classification_Arousal : num 0 0 0 0 0 0 0 0 0 0 ...
$ PTT_Events : num 1 0 2 0 0 0 0 0 0 0 ...
$ Systolic_Events : num 1 0 1 0 0 0 0 0 0 0 ...
$ y : num 1 0 1 0 0 0 0 0 0 0 ...
加载随机森林包
> library(randomForest)
进行训练 以y作为因变量,其余数据作为自变量
> rf <- randomForest(y ~ ., data=data, ntree=100, proximity=TRUE,importance=TRUE)
> plot(rf)
重要性检测
衡量把一个变量的取值变为随机数,随机森林预测准确性的降低程度
> importance(rf,type=1)
%IncMSE
Acti_Profile 0.00000000
Activity 0.99353251
Diastolic_PTT 0.32193611
Diastolic 1.99891809
Heart_Rate_Curve 0.92001352
Heart_Rate_Variability_HF 2.07870722
Heart_Rate_Variability_LF -0.24957163
MAP 0.48142975
Position 1.86876751
PTT_Raw 1.94648914
RR_Interval 0.60557964
Sleep_Wake 1.00503782
SpO2 0.25396165
Sympatho_Vagal_Balance 1.42906765
Systolic_PTT 1.27965813
Systolic 0.77382673
Autonomic_arousals 0.00000000
Cardio_complex 1.00503782
Cardio_rhythm 1.14283152
Classification_Arousal -0.04383997
PTT_Events 4.63980680
Systolic_Events 33.29461169
输出随机森林的模型
> print(rf) Call: randomForest(formula = y ~ ., data = data, ntree = 100, proximity = TRUE, importance = TRUE) Type of random forest: regression Number of trees: 100 No. of variables tried at each split: 7 Mean of squared residuals: 0.003226897 残差平方和SSE % Var explained: 98.7 |
|||||
|
支持向量机
library(e1071)
svmfit<-svm(y~.,data=data,kernel="linear",cost=10,scale=FALSE)
> print(svmfit)
Call:
svm(formula = y ~ ., data = data, kernel = "linear", cost = 10, scale = FALSE)
Parameters:
SVM-Type: eps-regression
SVM-Kernel: linear
cost: 10
gamma: 0.04545455
epsilon: 0.1
Number of Support Vectors: 20
> plot(svmfit,data)
神经网络
> concrete<-read_excel("Concrete_Data.xls")
> str(concrete)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1030 obs. of 9 variables:
$ Cement : num 540 540 332 332 199 ...
$ Slag : num 0 0 142 142 132 ...
$ Ash : num 0 0 0 0 0 0 0 0 0 0 ...
$ water : num 162 162 228 228 192 228 228 228 228 228 ...
$ superplastic: num 2.5 2.5 0 0 0 0 0 0 0 0 ...
$ coarseagg : num 1040 1055 932 932 978 ...
$ fineagg : num 676 676 594 594 826 ...
$ age : num 28 28 270 365 360 90 365 28 28 28 ...
$ strength : num 80 61.9 40.3 41.1 44.3 ...
> normalize <- function(x){ return ((x-min(x))/(max(x)-min(x)))}
> concrete_norm <- as.data.frame(lapply(concrete,normalize))
> concrete_train <- concrete_norm[1:773,]
> concrete_test <- concrete_norm[774:1030,]
> library(neuralnet)
> concrete_model <- neuralnet(strength ~ Cement+Slag+Ash+water+superplastic+coarseagg+fineagg+age,data=concrete_train)
> plot(concrete_model)
model_results <- compute(concrete_model,concrete_test[1:8])
predicted_strength <- model_results$net.result
> cor(predicted_strength,concrete_test$strength)
[,1]
[1,] 0.7205120076
> concrete_model2 <- neuralnet(strength ~ Cement+Slag+Ash+water+superplastic+coarseagg+fineagg+age,data=concrete_train,hidden=5)
> plot(concrete_model2)
计算误差
> model_results2 <- compute(concrete_model2,concrete_test[1:8]) > predicted_strength2 <- model_results2$net.result > cor(predicted_strength2,concrete_test$strength) [,1] [1,] 0.6727155609 |
|
|
主成分分析
身高、体重、胸围、坐高
> test<-data.frame(
+ X1=c(148, 139, 160, 149, 159, 142, 153, 150, 151, 139,
+ 140, 161, 158, 140, 137, 152, 149, 145, 160, 156,
+ 151, 147, 157, 147, 157, 151, 144, 141, 139, 148),
+ X2=c(41, 34, 49, 36, 45, 31, 43, 43, 42, 31,
+ 29, 47, 49, 33, 31, 35, 47, 35, 47, 44,
+ 42, 38, 39, 30, 48, 36, 36, 30, 32, 38),
+ X3=c(72, 71, 77, 67, 80, 66, 76, 77, 77, 68,
+ 64, 78, 78, 67, 66, 73, 82, 70, 74, 78,
+ 73, 73, 68, 65, 80, 74, 68, 67, 68, 70),
+ X4=c(78, 76, 86, 79, 86, 76, 83, 79, 80, 74,
+ 74, 84, 83, 77, 73, 79, 79, 77, 87, 85,
+ 82, 78, 80, 75, 88, 80, 76, 76, 73, 78)
+ )
> test.pr<-princomp(test,cor=TRUE)
> summary(test.pr,loadings=TRUE)
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4
Standard deviation 1.8817805390 0.55980635717 0.28179594325 0.25711843909
Proportion of Variance 0.8852744993 0.07834578938 0.01985223841 0.01652747293
Cumulative Proportion 0.8852744993 0.96362028866 0.98347252707 1.00000000000
Loadings:
Comp.1 Comp.2 Comp.3 Comp.4
X1 0.497 0.543 -0.450 0.506
X2 0.515 -0.210 -0.462 -0.691
X3 0.481 -0.725 0.175 0.461
X4 0.507 0.368 0.744 -0.232
前两个主成分的累计贡献率已经达到96% 可以舍去另外两个主成分达到降维的目的
因此可以得到函数表达式 Z1=-0.497X'1-0.515X'2-0.481X'3-0.507X'4
Z2= 0.543X'1-0.210X'2-0.725X'3-0.368X'4
4.画主成分的碎石图并预测
> screeplot(test.pr,type="lines")
> p<-predict(test.pr)
> p Comp.1 Comp.2 Comp.3 Comp.4 [1,] -0.06990949737 -0.23813701272 -0.35509247634 -0.266120139417 [2,] -1.59526339772 -0.71847399061 0.32813232022 -0.118056645885 [3,] 2.84793151061 0.38956678680 -0.09731731272 -0.279482487139 [4,] -0.75996988424 0.80604334819 -0.04945721875 -0.162949297761 [5,] 2.73966776853 0.01718087263 0.36012614873 0.358653043787 [6,] -2.10583167924 0.32284393414 0.18600422367 -0.036456083707 [7,] 1.42105591247 -0.06053164925 0.21093320662 -0.044223092351 [8,] 0.82583976981 -0.78102575640 -0.27557797533 0.057288571933 [9,] 0.93464401954 -0.58469241699 -0.08814135786 0.181037745585 [10,] -2.36463819933 -0.36532199291 0.08840476284 0.045520127461 [11,] -2.83741916086 0.34875841111 0.03310422938 -0.031146930047 [12,] 2.60851223537 0.21278727930 -0.33398036623 0.210157574387 [13,] 2.44253342081 -0.16769495893 -0.46918095412 -0.162987829937 [14,] -1.86630668724 0.05021383642 0.37720280364 -0.358821916178 [15,] -2.81347420580 -0.31790107093 -0.03291329149 -0.222035112399 [16,] -0.06392982655 0.20718447599 0.04334339948 0.703533623798 [17,] 1.55561022242 -1.70439673831 -0.33126406220 0.007551878960 [18,] -1.07392250663 -0.06763418320 0.02283648409 0.048606680158 [19,] 2.52174211878 0.97274300950 0.12164633439 -0.390667990681 [20,] 2.14072377494 0.02217881219 0.37410972458 0.129548959692 [21,] 0.79624421805 0.16307887263 0.12781269571 -0.294140762463 [22,] -0.28708320594 -0.35744666106 -0.03962115883 0.080991988802 [23,] 0.25151075072 1.25555187663 -0.55617324819 0.109068938725 [24,] -2.05706031616 0.78894493512 -0.26552109297 0.388088642937 [25,] 3.08596854773 -0.05775318018 0.62110421208 -0.218939612456 [26,] 0.16367554630 0.04317931667 0.24481850312 0.560248997030 [27,] -1.37265052598 0.02220972121 -0.23378320040 -0.257399715466 [28,] -2.16097778154 0.13733232981 0.35589738735 0.093123683044 [29,] -2.40434826507 -0.48613137190 -0.16154440788 -0.007914021222 [30,] -0.50287467640 0.14734316507 -0.20590831261 -0.122078819188 |
|
|
93、R语言教程详解的更多相关文章
- SAE上传web应用(包括使用数据库)教程详解及问题解惑
转自:http://blog.csdn.net/baiyuliang2013/article/details/24725995 SAE上传web应用(包括使用数据库)教程详解及问题解惑: 最近由于工作 ...
- QuartusII13.0使用教程详解(一个完整的工程建立)
好久都没有发布自己的博客了,因为最近学校有比赛,从参加到现在都是一脸懵逼,幸亏有bingo大神的教程,让我慢慢走上了VIP之旅,bingo大神的无私奉献精神值得我们每一个业界人士学习,向bingo致敬 ...
- windows上安装Anaconda和python的教程详解
一提到数字图像处理编程,可能大多数人就会想到matlab,但matlab也有自身的缺点: 1.不开源,价格贵 2.软件容量大.一般3G以上,高版本甚至达5G以上. 3.只能做研究,不易转化成软件. 因 ...
- 干货!上古神器 sed 教程详解,小白也能看的懂
目录: 介绍工作原理正则表达式基本语法数字定址和正则定址基本子命令实战练习 介绍 熟悉 Linux 的同学一定知道大名鼎鼎的 Linux 三剑客,它们是 grep.awk.sed,我们今天要聊的主角就 ...
- 史上最全的maven pom.xml文件教程详解
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/20 ...
- webpack安装配置使用教程详解
webpack安装配置使用教程详解 www.111cn.net 更新:2015-09-01 编辑:swteen 来源:转载 本文章来为各位详细的介绍一下关于webpack安装配置使用教程吧,这篇文章对 ...
- 重置出错?微软Win10平板Surface Pro 4重装系统教程详解
重置出错?微软Win10平板Surface Pro 4重装系统教程详解 2015-12-11 15:27:30来源:IT之家作者:凌空责编:凌空 评论:65 Surface Pro 4系统重置出错该怎 ...
- Ubuntu下安装JDK图文教程详解 jdk-java6-30 .bin 的处理方法
Ubuntu下安装JDK图文教程详解 jdk-java6-30 .bin 的处理方法: https://blog.csdn.net/mingjie1212/article/details/485250 ...
- Webstorm使用教程详解
Webstorm使用教程详解 Webstorm垂直分栏.左右分栏 Webstorm 主题.背景.颜色等设置的导入导出 使用WebStorm开发web前端 网页中文乱码问题的解决方案 Webstor ...
随机推荐
- 测开之路三十一:Flask基础之请求与相应
from flask import requestrequest.pathrequest.methodrequest.formrequest.argsrequest.values 一般用form获取p ...
- ubuntu 设置固定IP
vim /etc/network/interface address 要固定的IP地址 netmask 子网掩码 A类地址 默认255.0.0.0 B类地址默 255.255.0.0 ...
- apt-cyg for Cygwin(setup-x86_64 .exe )在win10下的安装
cygwin安装后,如果没有选择安装所有包(这会占用5G空间,很多包不需要),再需要安装新的包,可以启动setup-x86_64 .exe(我把它放置在C:\cygwin64目录下),添加包(如wge ...
- 安装第三方包&查看python版本/第三方包版本
安装第三方包时,经常需要查看python版本,以及是否安装第三方包及版本,每次都要百度下指令. 故小编整理了下安装/卸载第三方包,查看python/第三包的指令,具体如下: 一.python安装/卸载 ...
- oracle执行计划(转载)
转载自 https://www.cnblogs.com/Dreamer-1/p/6076440.html 一:什么是Oracle执行计划? 执行计划是一条查询语句在Oracle中的执行过程或访问路径的 ...
- Python Challenge 关卡目录及解答过程
第0关:http://www.pythonchallenge.com/pc/def/0.html 线索:试着改变URL的地址-->把图片中得到的数字输入到URL中 2**38 输出: 第1关:h ...
- MQ之Kafka
现代的互联网分布式系统,只要稍微大一些,就一定逃不开3类中间件:远程调用(RPC)框架.消息队列.数据库访问中间件.Kafka 是消息队列中间件的代表产品,用 Scala 语言实现; 基本概念 首先, ...
- vue规范规则
vue组件,数据通信,样式,JS的规范规则.对vue官方风格指南的总结归类并加入自己的规范,在团队小组中使用. 1.项目名/文件目录命名: kebab-case(- 连接): 项目名:vue-admi ...
- 更改mysql最大连接数
方法一: 打开cmd,用"mysql -u root -p;"命令进入mysql, 输入命令:show variables like "max_connections&q ...
- 使用egg.js和egg-sequelize连接mysql
1.通过 egg-init 初始化一个项目: egg-init --type=simple --dir=sequelize-projectcd sequelize-projectnpm i 2.安装并 ...