sklearn 随机森林方法
Notes The default values for the parameters controlling the size of the trees (e.g. max_depth, min_samples_leaf, etc.) lead to fully grown and unpruned trees
which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting
those parameter values. The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data, max_features=n_features
and bootstrap=False, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a
deterministic behaviour during fitting, random_state has to be fixed. References [R157]
Breiman, “Random Forests”, Machine Learning, (), -, .
Methods
apply (X) |
Apply trees in the forest to X, return leaf indices. |
decision_path (X) |
Return the decision path in the forest |
fit (X, y[, sample_weight]) |
Build a forest of trees from the training set (X, y). |
get_params ([deep]) |
Get parameters for this estimator. |
predict (X) |
Predict class for X. |
predict_log_proba (X) |
Predict class log-probabilities for X. |
predict_proba (X) |
Predict class probabilities for X. |
score (X, y[, sample_weight]) |
Returns the mean accuracy on the given test data and labels. |
set_params (**params) |
Set the parameters of this estimator. |
predict
(X)-
Predict class for X.
The predicted class of an input sample is a vote by the trees in the forest, weighted by their probability estimates. That is, the predicted class is the one with highest mean probability estimate across the trees.
Parameters: X : array-like or sparse matrix of shape = [n_samples, n_features]
The input samples. Internally, its dtype will be converted to
dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix
.Returns: y : array of shape = [n_samples] or [n_samples, n_outputs]
The predicted classes.
predict_log_proba
(X)-
Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the trees in the forest.
Parameters: X : array-like or sparse matrix of shape = [n_samples, n_features]
The input samples. Internally, its dtype will be converted to
dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix
.Returns: p : array of shape = [n_samples, n_classes], or a list of n_outputs
such arrays if n_outputs > 1. The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
predict_proba
(X)-
Predict class probabilities for X.
The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.
Parameters: X : array-like or sparse matrix of shape = [n_samples, n_features]
The input samples. Internally, its dtype will be converted to
dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparsecsr_matrix
.Returns: p : array of shape = [n_samples, n_classes], or a list of n_outputs
such arrays if n_outputs > 1. The class probabilities of the input samples. The order of the classes corresponds to that in the attribute classes_.
score
(X, y, sample_weight=None)-
Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: X : array-like, shape = (n_samples, n_features)
Test samples.
y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True labels for X.
sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Returns: score : float
Mean accuracy of self.predict(X) wrt. y.
From Sklearn:
http://sklearn.apachecn.org/cn/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier
sklearn 随机森林方法的更多相关文章
- 使用基于Apache Spark的随机森林方法预测贷款风险
使用基于Apache Spark的随机森林方法预测贷款风险 原文:Predicting Loan Credit Risk using Apache Spark Machine Learning R ...
- 解决sklearn 随机森林数据不平衡的方法
Handle Imbalanced Classes In Random Forest Preliminaries # Load libraries from sklearn.ensemble im ...
- sklearn_随机森林random forest原理_乳腺癌分类器建模(推荐AAA)
sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频) https://study.163.com/course/introduction.htm?courseId=1005269003& ...
- 随机森林random forest及python实现
引言想通过随机森林来获取数据的主要特征 1.理论根据个体学习器的生成方式,目前的集成学习方法大致可分为两大类,即个体学习器之间存在强依赖关系,必须串行生成的序列化方法,以及个体学习器间不存在强依赖关系 ...
- 决策树-预测隐形眼镜类型 (ID3算法,C4.5算法,CART算法,GINI指数,剪枝,随机森林)
1. 1.问题的引入 2.一个实例 3.基本概念 4.ID3 5.C4.5 6.CART 7.随机森林 2. 我们应该设计什么的算法,使得计算机对贷款申请人员的申请信息自动进行分类,以决定能否贷款? ...
- 随机森林入门攻略(内含R、Python代码)
随机森林入门攻略(内含R.Python代码) 简介 近年来,随机森林模型在界内的关注度与受欢迎程度有着显著的提升,这多半归功于它可以快速地被应用到几乎任何的数据科学问题中去,从而使人们能够高效快捷地获 ...
- 随机森林学习-sklearn
随机森林的Python实现 (RandomForestClassifier) # -*- coding: utf- -*- """ RandomForestClassif ...
- sklearn中的随机森林
阅读了Python的sklearn包中随机森林的代码实现,做了一些笔记. sklearn中的随机森林是基于RandomForestClassifier类实现的,它的原型是 class RandomFo ...
- kaggle 欺诈信用卡预测——不平衡训练样本的处理方法 综合结论就是:随机森林+过采样(直接复制或者smote后,黑白比例1:3 or 1:1)效果比较好!记得在smote前一定要先做标准化!!!其实随机森林对特征是否标准化无感,但是svm和LR就非常非常关键了
先看数据: 特征如下: Time Number of seconds elapsed between each transaction (over two days) numeric V1 No de ...
随机推荐
- BUG:upstream timed out (10060: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected
更换Apache扑向Nginx,刚搭建完WNMP,nginx能访问php页面 但是访问现有开发项目报错 [error] 4112#3724: *9 upstream timed out (10060: ...
- 【数据结构】【平衡树】浅析树堆Treap
[Treap] [Treap浅析] Treap作为二叉排序树处理算法之一,首先得清楚二叉排序树是什么.对于一棵树的任意一节点,若该节点的左子树的所有节点的关键字都小于该节点的关键字,且该节点的右子树的 ...
- [转]Hibernate中Session的get和load
hibernate中Session接口提供的get()和load()方法都是用来获取一个实体对象,在使用方式和查询性能上有一些区别.测试版本:hibernate 4.2.0. get Session接 ...
- 8VC Venture Cup 2016 - Final Round (Div. 2 Edition) A. Orchestra 水题
A. Orchestra 题目连接: http://www.codeforces.com/contest/635/problem/A Description Paul is at the orches ...
- 打造百度网盘备份利器:自动备份Linux VPS文件和多线程下载百度网盘资源
前一段时间国内的各大网盘百度云盘,金山快盘,360云盘,华为网盘为争夺用户上演空间容量博弈,网盘商们还固执地以为中国的网民都不懂网络技术,可以像某公司那样用一些数字的手段来忽悠用户,参与到网盘商的数字 ...
- Inno Setup入门(十七)——Inno Setup类参考(3)
标签 标签(Label)是用来显示文本的主要组件之一,也是窗口应用程序中最常用的组件之一,通过对标签的使用,将能够给用户提供更加详细的信息. Pascal脚本中的标签由类TlLabel实现,该 ...
- segmentfault hackthon比赛感悟
之前本来是打算用node好好系统的写下程序,写下博客. 这两天因为segmentfault hackthon比赛,所以就没更新.写这篇博客的目的,是为了说明自己參赛的感悟. 今天比赛,能够说自己特别失 ...
- bootstrap table 复选框选中后,翻页不影响已选中的复选框
使用的 jquery版本为 2.1.1 在项目中发现bootstrap table的复选框选中后,翻页操作会导致上一页选中的丢失,api中的 bootstrapTable('getSelections ...
- 流畅的python第八章对象引用,可变性和垃圾回收
变量不是盒子 在==和is之间选择 ==比较两个对象的值,而is比较对象的标识 元组的相对不可变姓 元组与多数的python集合(列表,字典,集,等等)一样,保存的是对象的引用.如果引用的元素是可变的 ...
- C# format 日期 各种 符号 实例分析如何精确C#日期格式到毫秒
摘 自: http://developer.51cto.com/art/200908/141145.htm 实例分析如何精确C#日期格式到毫秒 2009-08-03 10:48 paulfzm jav ...