特征选择Boruta

A good feature subset is one that：

contains features highly correlated with (predictive of) the class,

yet uncorrelated with (not predictive of) each other.

特征选择的三种方法：

1）单一变量选择法：假设特征变量与响应变量y是线性关系。看每个特征变量与响应变量y的相关程度。

2）随机森林法：假设特征变量与响应变量y是非线性关系。根据特征的重要性排序，来选择特征。

3）RFE（ recursive feature elimination）：递归特征消除。

利用pipeline + gridSearchCv 实现对特征选择+ 分类器的参数优化选择。

Because RandomizedLogisticRegression is used for feature selection, it would need to be cross validated as part of a pipeline. You can apply GridSearchCV to a Pipeline which contains it as a feature selection step along with your classifier of choice. An example might look like:

pipeline = Pipeline([

  ('fs', RandomizedLogisticRegression()),

  ('clf', LogisticRegression())

])

params = {'fs__C':[0.1, 1, 10]}

grid_search = GridSearchCV(pipeline, params)
grid_search.fit(X_train,y_train)

参考文献：

http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/

使用Boruta前，需要对缺失值进行填充。

https://www.analyticsvidhya.com/blog/2016/03/select-important-variables-boruta-package/

Variable selection is an important aspect of model building which every analyst must learn. After all, it helps in building predictive models free from correlated variables, biases and unwanted noise.

A lot of novice analysts assume that keeping all (or more) variables will result in the best model as you are not losing any information. Sadly, that is not true!

How many times has it happened that removing a variable from model has increased your model accuracy ?

At least, it has happened to me. Such variables are often found to be correlated and hinder achieving higher model accuracy. Today, we’ll learn one of the ways of how to get rid of such variables in R. I must say, R has an incredible CRAN repository. Out of all packages, one such available package for variable selection is Boruta Package.

特征选择Boruta的更多相关文章

挑子学习笔记：特征选择——基于假设检验的Filter方法
转载请标明出处: http://www.cnblogs.com/tiaozistudy/p/hypothesis_testing_based_feature_selection.html Filter ...
用信息值进行特征选择(Information Value)
Posted by c cm on January 3, 2014 特征选择(feature selection)或者变量选择(variable selection)是在建模之前的重要一步.数据接口越 ...
MIL 多示例学习特征选择
一个主要的跟踪系统包含三个成分:1)外观模型,通过其可以估计目标的似然函数.2)运动模型,预测位置.3)搜索策略,寻找当前帧最有可能为目标的位置.MIL主要的贡献在第一条上. MIL与CT的不同在于后 ...
【转】[特征选择] An Introduction to Feature Selection 翻译
中文原文链接:http://www.cnblogs.com/AHappyCat/p/5318042.html 英文原文链接: An Introduction to Feature Selection ...
单因素特征选择--Univariate Feature Selection
An example showing univariate feature selection. Noisy (non informative) features are added to the i ...
主成分分析（PCA）特征选择算法详解
1. 问题真实的训练数据总是存在各种各样的问题: 1. 比如拿到一个汽车的样本,里面既有以“千米/每小时”度量的最大速度特征,也有“英里/小时”的最大速度特征,显然这两个特征有一个多余. 2. 拿到 ...
干货：结合Scikit-learn介绍几种常用的特征选择方法
原文 http://dataunion.org/14072.html 主题特征选择 scikit-learn 作者: Edwin Jarvis 特征选择(排序)对于数据科学家.机器学习从业者来说非 ...
【Machine Learning】wekaの特征选择简介
看过这篇博客的都应该明白,特征选择代码实现应该包括3个部分: 搜索算法: 评估函数: 数据: 因此,代码的一般形式为: AttributeSelection attsel = new Attribut ...
weka特征选择（IG、chi-square)
一.说明 IG是information gain 的缩写,中文名称是信息增益,是选择特征的一个很有效的方法(特别是在使用svm分类时).这里不做详细介绍,有兴趣的可以googling一下. chi-s ...

随机推荐

bzoj 3124 直径
Written with StackEdit. Description 小\(Q\)最近学习了一些图论知识.根据课本,有如下定义. 树:无回路且连通的无向图,每条边都有正整数的权值来表示其长度.如果一 ...
kali学习
kali视频学习第二周 kali视频(1-5) 1.kali安装 2.基本配置 vmtools安装过程. 3.安全渗透测试一般流程 4.信息搜集之GoogleHack 5.信息搜集之目标获取第三周 ...
Django Rest Framework - Could not resolve URL for hyperlinked relationship using view name “user-detail”
要把跟当前表相关的viewset定义出来 http://stackoverflow.com/questions/20550598/django-rest-framework-could-not-res ...
AVAWEB学习笔记 ---- 系列文章
[JAVAWEB学习笔记]网上商城实战5:后台的功能模块 [JAVAWEB学习笔记]网上商城实战4:订单模块 [JAVAWEB学习笔记]网上商城实战3:购物模块和订单模块 [JAVAWEB学习笔记]网 ...
洛谷 1155 (NOIp2008)双栈排序——仔细分析不合法的条件
题目:https://www.luogu.org/problemnew/show/P1155 这道题教会我们要多思考. 好好分析过后发现同一个栈里不能有升序.就用它写了一个30分. #include& ...
FPGA学习中的代码阅读
不管是学FPGA还是C语言,任何一种代码的学习都离不开大量的代码阅读,也就是多看,多学习别人的代码.初学者在学习的过程中更为重要的是模仿,模仿别人的代码算法怎么去处理的,模仿多了,代码看的多了,能力自 ...
[转载]交换机STP协议
注:之前做一个项目,测试部使用2个公司的交换机,H3C和H公司的,H公司的交换机是OEM H3C的交换机,正常来说两者使用没有区别. 但是使用中发现,如果设备的多个对外业务网口连接的交换机的聚合网口, ...
复制书稿(book) （二分，贪心+dp）
复制书稿(book) 时间限制: 1 Sec 内存限制: 128 MB提交: 3 解决: 1[提交][状态][讨论版][命题人:quanxing] 题目描述现在要把m本有顺序的书分给k个人复制( ...
事件调度器及C++中的使用
转自:http://blog.ch-wind.com/ue4-event-dispatcher-and-delegate/ 事件调度器非常的适合在各个蓝图之间实现通信功能. 当前UE4版本4.8.3. ...
git学习2 - 安装
msysgit是Windows版的Git,从https://git-for-windows.github.io下载(网速慢的同学请移步国内镜像),然后按默认选项安装即可. 安装完成后,在开始菜单里找到 ...

特征选择Boruta

特征选择Boruta的更多相关文章

随机推荐

热门专题