xgboost学习

1、原理

https://www.cnblogs.com/zhouxiaohui888/p/6008368.html

2、实战

xgboost中比较重要的参数介绍：

（1）学习率：learning rate ：一般设置比较低，0.1以下

（2）tree：

max_depth

min_child_weight

subsample

colsample_bytree

gamma

（3）正则化参数

lambda

alpha

（1）objective [ default=reg:linear ] 定义学习任务及相应的学习目标，可选的目标函数如下：

“reg:linear” –线性回归。
“reg:logistic” –逻辑回归。
“binary:logistic” –二分类的逻辑回归问题，输出为概率。
“binary:logitraw” –二分类的逻辑回归问题，输出的结果为wTx。
“count:poisson” –计数问题的poisson回归，输出结果为poisson分布。在poisson回归中，max_delta_step的缺省值为0.7。(used to safeguard optimization)
“multi:softmax” –让XGBoost采用softmax目标函数处理多分类问题，同时需要设置参数num_class（类别个数）
“multi:softprob” –和softmax一样，但是输出的是ndata * nclass的向量，可以将该向量reshape成ndata行nclass列的矩阵。没行数据表示样本所属于每个类别的概率。
“rank:pairwise” –set XGBoost to do ranking task by minimizing the pairwise loss

（2）’eval_metric’ The choices are listed below，评估指标:

“rmse”: root mean square error
“logloss”: negative log-likelihood
“error”: Binary classification error rate. It is calculated as #(wrong cases)/#(all cases). For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.
“merror”: Multiclass classification error rate. It is calculated as #(wrong cases)/#(all cases).
“mlogloss”: Multiclass logloss
“auc”: Area under the curve for ranking evaluation.
“ndcg”:Normalized Discounted Cumulative Gain
“map”:Mean average precision
“ndcg@n”,”map@n”: n can be assigned as an integer to cut off the top positions in the lists for evaluation.
“ndcg-“,”map-“,”ndcg@n-“,”map@n-“: In XGBoost, NDCG and MAP will evaluate the score of a list without any positive samples as 1. By adding “-” in the evaluation metric XGBoost will evaluate these score as 0 to be consistent under some conditions.

（3）lambda [default=0] L2 正则的惩罚系数

（4）alpha [default=0] L1 正则的惩罚系数

（5）lambda_bias 在偏置上的L2正则。缺省值为0（在L1上没有偏置项的正则，因为L1时偏置不重要）

（6）eta [default=0.3]
为了防止过拟合，更新过程中用到的收缩步长。在每次提升计算之后，算法会直接获得新特征的权重。 eta通过缩减特征的权重使提升计算过程更加保守。缺省值为0.3
取值范围为：[0,1]

（7）max_depth [default=6] 数的最大深度。缺省值为6 ，取值范围为：[1,∞]

（8）min_child_weight [default=1]
孩子节点中最小的样本权重和。如果一个叶子节点的样本权重和小于min_child_weight则拆分过程结束。在现行回归模型中，这个参数是指建立每个模型所需要的最小样本数。该成熟越大算法越conservative
取值范围为: [0,∞]

xgb1=XGBClassifier(

learning_rate=0.1,

n_estimators=1000,

max_depth=5,

min_child_weight=1,

gamma=0,

subsample=0.8

colsample_bytree=0.8,

objective='binary:logistic',

nthread=4,

scale_pos_weight=1,

seed=27)

3、xgboost重要模块：plot_importance【显示特征的重要性】

from xgboost import XGBClassifier

from xgboost import plot_importance

from matplotlib import pyplot

model=XGBClassifier()

model.fit(X,Y)

plot_importance(model)

pyplot.show()

#图中就可以显示出各种特征的重要性

xgboost学习的更多相关文章

【新人赛】阿里云恶意程序检测 -- 实践记录11.10 - XGBoost学习 / 代码阅读、调参经验总结
XGBoost学习: 集成学习将多个弱学习器结合起来,优势互补,可以达到强学习器的效果.要想得到最好的集成效果,这些弱学习器应当"好而不同". 根据个体学习器的生成方法,集成学习方 ...
XGboost学习总结
XGboost,全称Extrem Gradient boost,极度梯度提升,是陈天奇大牛在GBDT等传统Boosting算法的基础上重新优化形成的,是Kaggle竞赛的必杀神器. XGboost属于 ...
xgboost学习与总结
最近在研究xgboost,把一些xgboost的知识总结一下.这里只是把相关资源作总结,原创的东西不多. 原理 xgboost的原理首先看xgboost的作者陈天奇的ppt 英文不太好的同学可以看看这 ...
数据竞赛利器 —— xgboost 学习清单
1. 入门大全 xgboost 作者给出的一份完备的使用 xgboost 进行数据分析的完整示例代码:A walk through python example for UCI Mushroom da ...
XGBoost学习笔记2
XGBoost API 参数分类问题使用逻辑回归 # Import xgboost import xgboost as xgb # Create arrays for the features a ...
XGBoost学习笔记1
XGBoost XGBoost这个网红大杀器,似乎很好用,完事儿还是自己推导一遍吧,datacamp上面有辅助的课程,但是不太涉及原理它究竟有多好用呢?我还没用过,先搞清楚原理,hahaha~ 参考 ...
【Python机器学习实战】决策树与集成学习（七）——集成学习（5）XGBoost实例及调参
上一节对XGBoost算法的原理和过程进行了描述,XGBoost在算法优化方面主要在原损失函数中加入了正则项,同时将损失函数的二阶泰勒展开近似展开代替残差(事实上在GBDT中叶子结点的最优值求解也是使 ...
xgboost原理及应用
1.背景关于xgboost的原理网络上的资源很少,大多数还停留在应用层面,本文通过学习陈天奇博士的PPT 地址和xgboost导读和实战地址,希望对xgboost原理进行深入理解. 2.xgboo ...
xgboost原理及应用--转
1.背景关于xgboost的原理网络上的资源很少,大多数还停留在应用层面,本文通过学习陈天奇博士的PPT地址和xgboost导读和实战地址,希望对xgboost原理进行深入理解. 2.xgboos ...

随机推荐

WEBGL学习【四】模型视图矩阵
<html lang="zh-CN"> <!--服务器运行地址:http://127.0.0.1:8080/webgl/LearnNeHeWebGL/NeHeWe ...
【SPOJ 104】HIGH - Highways (高斯消元)
题目描述 In some countries building highways takes a lot of time- Maybe that's because there are many po ...
docker数据卷的使用 -v --volumes--from
总结一下docker数据管理的三种方法: 1.普通的挂在数据: -v docker run -v /father/path:/child/path-v 参数会把当前系统的文件目录/father/pa ...
一次 Laravel 性能分析全程笔记
大家都知道 laravel 项目写起来是挺爽,但是在生产环境性能不高,我们来抽丝剥茧分析我自己项目的运行时间消耗: Bootstrap 耗时步骤耗时 Illuminate\Foundation\B ...
android生成sdk.jar 小工具
net.sf.fjep.fatjar_0.0.31.jar 生成jar的工具把这个net.sf.fjep.fatjar_0.0.31.jar到Eclipse的plugins中,从启Eclipse.点 ...
Zepto.js实现fadeIn,fadeOut功能
Zepto是一个轻量级的针对现代高级浏览器的JavaScript库, 它与jquery有着类似的api. 如果你会用jquery,那么你也会用zepto. Zepto的设计目的是提供 jQuery 的 ...
jquery-ajax基础-XMLHttpRequest
XMLHttpRequest知识点原生的ajax代码 var xmlhttp; // 声明一个对象 if (window.XMLHttpRequest) {// code for IE7+, Fir ...
HDU 2815
特判B不能大于等于C 高次同余方程 #include <iostream> #include <cstdio> #include <cstring> #includ ...
JavaWeb应用中的身份验证（声明式）——基于表单的身份认证
容器管理安全最普遍的类型建立在基于表单的身份验证方式上. 通过这样的方式,server自己主动将尚未验证的用户重定向到一个HTML表单.检查他们的username和password,决定他们属于哪个角 ...
FZU Problem 1853 Number Deletion
Problem 1853 Number Deletion Accept: 80 Submit: 239 Time Limit: 1000 mSec Memory Limit : 32768 ...