sklearn Model-selection + Pipeline

1 GridSearch

import numpy as np

from sklearn.datasets import load_digits

from sklearn.ensemble import RandomForestClassifier

from sklearn.grid_search import GridSearchCV

from sklearn.grid_search import RandomizedSearchCV

# 生成数据

digits = load_digits()

X, y = digits.data, digits.target

# 元分类器

meta_clf = RandomForestClassifier(n_estimators=20)

# =================================================================

# 设置参数

param_dist = {"max_depth": [3, None],

              "max_features": sp_randint(1, 11),

              "min_samples_split": sp_randint(1, 11),

              "min_samples_leaf": sp_randint(1, 11),

              "bootstrap": [True, False],

              "criterion": ["gini", "entropy"]}

# 运行随机搜索 RandomizedSearch

n_iter_search = 20

rs_clf = RandomizedSearchCV(meta_clf, param_distributions=param_dist,

                                   n_iter=n_iter_search)

start = time()

rs_clf.fit(X, y)

print("RandomizedSearchCV took %.2f seconds for %d candidates"

      " parameter settings." % ((time() - start), n_iter_search))

print(rs_clf.grid_scores_)

2search

# =================================================================

# 设置参数

param_grid = {"max_depth": [3, None],

              "max_features": [1, 3, 10],

              "min_samples_split": [1, 3, 10],

              "min_samples_leaf": [1, 3, 10],

              "bootstrap": [True, False],

              "criterion": ["gini", "entropy"]}

# 运行网格搜索 GridSearch

gs_clf = GridSearchCV(meta_clf, param_grid=param_grid)

start = time()

gs_clf.fit(X, y)

print("GridSearchCV took %.2f seconds for %d candidate parameter settings."

      % (time() - start, len(gs_clf.grid_scores_)))

print(gs_clf.grid_scores_)

 from sklearn import svm

 from sklearn.datasets import samples_generator

 from sklearn.feature_selection import SelectKBest

 from sklearn.feature_selection import f_regression

 from sklearn.pipeline import Pipeline

 # 生成数据

 X, y = samples_generator.make_classification(n_informative=5, n_redundant=0, random_state=42)

 # 定义Pipeline，先方差分析，再SVM

 anova_filter = SelectKBest(f_regression, k=5)

 clf = svm.SVC(kernel='linear')

 pipe = Pipeline([('anova', anova_filter), ('svc', clf)])

 # 设置anova的参数k=10，svc的参数C=0.1（用双下划线"__"连接！）

 pipe.set_params(anova__k=10, svc__C=.1)

 pipe.fit(X, y)

 prediction = pipe.predict(X)

 pipe.score(X, y)                        

 # 得到 anova_filter 选出来的特征

 s = pipe.named_steps['anova'].get_support()

 print(s)

sklearn Model-selection + Pipeline的更多相关文章

Scikit-learn：模型选择Model selection
http://blog.csdn.net/pipisorry/article/details/52250983 选择合适的estimator 通常机器学习最难的一部分是选择合适的estimator,不 ...
学习笔记之Model selection and evaluation
学习笔记之scikit-learn - 浩然119 - 博客园 https://www.cnblogs.com/pegasus923/p/9997485.html 3. Model selection ...
Spark2 Model selection and tuning 模型选择与调优
Model selection模型选择 ML中的一个重要任务是模型选择,或使用数据为给定任务找到最佳的模型或参数. 这也称为调优. 可以对诸如Logistic回归的单独Estimators进行调整,或 ...
scikit-learn：3. Model selection and evaluation
參考:http://scikit-learn.org/stable/model_selection.html 有待翻译,敬请期待: 3.1. Cross-validation: evaluating ...
Andrew Ng机器学习公开课笔记 -- Regularization and Model Selection
网易公开课,第10,11课 notes,http://cs229.stanford.edu/notes/cs229-notes5.pdf Model Selection 首先需要解决的问题是,模型 ...
转：机器学习规则化和模型选择（Regularization and model selection）
规则化和模型选择(Regularization and model selection) 转:http://www.cnblogs.com/jerrylead/archive/2011/03/27/1 ...
Use trained sklearn model with pyspark
Use trained sklearn model with pyspark from pyspark import SparkContext import numpy as np from sk ...
机器学习 Regularization and model selection
Regularization and model selection 假设我们为了一个学习问题尝试从几个模型中选择一个合适的模型.例如,我们可能用一个多项式回归模型hθ(x)=g(θ0+θ1x+θ2x ...
Bias vs. Variance(2)--regularization and bias/variance,如何选择合适的regularization parameter λ(model selection)
Linear regression with regularization 当我们的λ很大时,hθ(x)≍θ0,是一条直线,会出现underfit:当我们的λ很小时(=0时),即相当于没有做regul ...
评估预测函数(3)---Model selection(选择多项式的次数) and Train/validation/test sets
假设我们现在想要知道what degree of polynomial to fit to a data set 或者应该选择什么features 或者如何选择regularization par ...

随机推荐

DOM的概念及子节点类型【转】
前言 DOM的作用是将网页转为一个javascript对象,从而可以使用javascript对网页进行各种操作(比如增删内容).浏览器会根据DOM模型,将HTML文档解析成一系列的节点,再由这些节点组 ...
28个MongoDB 的问题
MongoDB是目前最好的面向文档的免费开源NoSQL数据库.如果你正准备参加MongoDB NoSQL数据库的技术面试,你最好看看下面的MongoDB NoSQL面试问答.这些MongoDB NoS ...
OSG-OSGEarth
OSG-OSGEarth 初次使用Cmake——以OsgEarth工程创建为例转:http://www.cnblogs.com/Realh/archive/2012/02/08/2342507.ht ...
R中list对象属性以及具有list性质的对象
R语言list的特点:It has length, [[ and [ methods, and is recursive because list can contain other list!上图显 ...
关于GreenOdoo的一个Bug
动态创建字段的时候,虽然字段已经创建,但是显示的时候会报没有新创建的字段错误: 但是原版是没有任何问题的,记录一下.
session.load()和session.get()的区别
Session.load/get方法均可以根据指定的实体类和id从数据库读取记录,并返回与之对应的实体对象. 其区别在于: 如果未能发现符合条件的记录,get方法返回null, 而load方法会 ...
OpenGL 小游戏贪吃蛇1(2D)
#include "stdafx.h" #include <GL/glut.h> #include <stdlib.h> #pragma comment(l ...
IOS第一天多线程-02NSThread基本使用
**** #import "HMViewController.h" @interface HMViewController () @end @implementation HMVi ...
java常用工具类
http://www.cnblogs.com/langtianya/p/3875124.html
SQL语句中count(1)count(*)count(字段)用法的区别
SQL语句中count(1)count(*)count(字段)用法的区别在SQL语句中count函数是最常用的函数之一,count函数是用来统计表中记录数的一个函数, 一. count(1)和cou ...

sklearn Model-selection + Pipeline

sklearn Model-selection + Pipeline的更多相关文章

随机推荐

热门专题