随机森林学习-2-sklearn

# -*- coding: utf-8 -*-

"""

RandomForestClassifier

skleran的9个模型在3份数据上的使用。

1. 知识点： sklearn自生成分类样本集、标准化、 画等高线图、拆分训练和测试集

2. 结果: 对于2维的线性和非线性的3个分类问题， 都证明了 随机森林可以取得较好效果。

"""

import numpy as np

import matplotlib.pyplot as plt

from matplotlib.colors import ListedColormap

from sklearn.cross_validation import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.datasets import make_moons, make_circles, make_classification

# make_classification 随机生成连续自变量和分类因变量

# make_moons 生成二维自变量 和分类自变量，生成半环形图、月亮型

# make_circles 生成二维自变量 和分类自变量，生成半环形图、月亮型

from sklearn.neighbors import KNeighborsClassifier

from sklearn.svm import SVC

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier

from sklearn.naive_bayes import GaussianNB

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA    # 线性判别分析（LDA）

from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA # 二次判别分析（QDA）

h = .02  # step size in the mesh

names = ["Nearest Neighbors", "Linear SVM", "RBF SVM", "Decision Tree",

         "Random Forest", "AdaBoost", "Naive Bayes", "LDA", "QDA"]

classifiers = [

    KNeighborsClassifier(3),

    SVC(kernel="linear", C=0.025),

    SVC(gamma=2, C=1),

    DecisionTreeClassifier(max_depth=5),

    RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),

    AdaBoostClassifier(),

    GaussianNB(),

    LDA(),

    QDA()]

# 样本集包含2个自变量， n_classes=2表示因变量类别中包含2类

X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,

                           random_state=1, n_clusters_per_class=1,n_classes=2) #生成样本集

plt.scatter(X[:, 0], X[:, 1], marker='o', c=y)

rng = np.random.RandomState(2)  #每次实例生成后，第n个实例的第i次随机，永远与第m个实例的第i次随机相同。

X += 2 * rng.uniform(size=X.shape)

linearly_separable = (X, y)

datasets = [make_moons(noise=0.3, random_state=0),

            make_circles(noise=0.2, factor=0.5, random_state=1),

            linearly_separable

            ]

figure = plt.figure(figsize=(27, 9))

i = 1

# iterate over datasets

for ds in datasets:

    # preprocess dataset, split into training and test part

    X, y = ds

    X = StandardScaler().fit_transform(X) # 标准化转换

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4) #划分训练和测试集

    x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5

    y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5

    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),

                         np.arange(y_min, y_max, h)) # 输出的xx，yy，就是坐标矩阵

    # just plot the dataset first

    cm = plt.cm.RdBu

    cm_bright = ListedColormap(['#FF0000', '#0000FF'])

    ax = plt.subplot(len(datasets), len(classifiers) + 1, i) # 画布

    # Plot the training points

    ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright)

    # and testing points

    ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6)

    ax.set_xlim(xx.min(), xx.max())

    ax.set_ylim(yy.min(), yy.max())

    ax.set_xticks(()) #清空了坐标轴数字

    ax.set_yticks(()) #清空了坐标轴数字

    i += 1

    # iterate over classifiers

    for name, clf in zip(names, classifiers):

        ax = plt.subplot(len(datasets), len(classifiers) + 1, i)

        clf.fit(X_train, y_train)

        score = clf.score(X_test, y_test)

        # Plot the decision boundary. For that, we will assign a color to each

        # point in the mesh [x_min, m_max]x[y_min, y_max].

        if hasattr(clf, "decision_function"):

            Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) #np.c_按照行连接两个矩阵[[1,2],[1,2],[1,2]] ,对mesh矩阵中每个样本点输入 经过f(x,y)输出一个预测。

        else:

            Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]

        # Put the result into a color plot

        Z = Z.reshape(xx.shape)

        ax.contourf(xx, yy, Z, cmap=cm, alpha=.8)   #画 预测函数在坐标系中的登高线图

        # Plot also the training points

        ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright) # 画训练数据散点图

        # and testing points

        ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright,

                   alpha=0.6) # 测试数据的散点图

        ax.set_xlim(xx.min(), xx.max()) #设置坐标轴范围

        ax.set_ylim(yy.min(), yy.max())

        ax.set_xticks(())  #清空坐标轴刻度

        ax.set_yticks(())

        ax.set_title(name) #设置正上方的标题

        ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip(''),

                size=15, horizontalalignment='right') #在坐标系中指定位置，添加文本

        i += 1

figure.subplots_adjust(left=.02, right=.98) #调整 图像间的空白区域

plt.show()

原始来源网址

随机森林学习-2-sklearn的更多相关文章

随机森林学习-sklearn
随机森林的Python实现 (RandomForestClassifier) # -*- coding: utf- -*- """ RandomForestClassif ...
[Machine Learning & Algorithm] 随机森林（Random Forest）
1 什么是随机森林? 作为新兴起的.高度灵活的一种机器学习算法,随机森林(Random Forest,简称RF)拥有广泛的应用前景,从市场营销到医疗保健保险,既可以用来做市场营销模拟的建模,统计客户来 ...
随机森林（Random Forest）
阅读目录 1 什么是随机森林? 2 随机森林的特点 3 随机森林的相关基础知识 4 随机森林的生成 5 袋外错误率(oob error) 6 随机森林工作原理解释的一个简单例子 7 随机森林的Pyth ...
随机森林（Random Forest），决策树，bagging， boosting（Adaptive Boosting，GBDT）
http://www.cnblogs.com/maybe2030/p/4585705.html 阅读目录 1 什么是随机森林? 2 随机森林的特点 3 随机森林的相关基础知识 4 随机森林的生成 5 ...
[Machine Learning & Algorithm] 随机森林（Random Forest）-转载
作者:Poll的笔记博客出处:http://www.cnblogs.com/maybe2030/ 阅读目录 1 什么是随机森林? 2 随机森林的特点 3 随机森林的相关基础知识 4 随机森林的生成 ...
随机森林（Random Forest，简称RF）
阅读目录 1 什么是随机森林? 2 随机森林的特点 3 随机森林的相关基础知识 4 随机森林的生成 5 袋外错误率(oob error) 6 随机森林工作原理解释的一个简单例子 7 随机森林的Pyth ...
随机森林random forest及python实现
引言想通过随机森林来获取数据的主要特征 1.理论根据个体学习器的生成方式,目前的集成学习方法大致可分为两大类,即个体学习器之间存在强依赖关系,必须串行生成的序列化方法,以及个体学习器间不存在强依赖关系 ...
随机森林（Random Forest）详解（转）
来源: Poll的笔记 cnblogs.com/maybe2030/p/4585705.html 1 什么是随机森林? 作为新兴起的.高度灵活的一种机器学习算法,随机森林(Random Fores ...
随机森林分类器（Random Forest）
阅读目录 1 什么是随机森林? 2 随机森林的特点 3 随机森林的相关基础知识 4 随机森林的生成 5 袋外错误率(oob error) 6 随机森林工作原理解释的一个简单例子 7 随机森林的Pyth ...

随机推荐

layui 批量上传文件 + 后台用servlet3.0接收【我】
前台代码: [主要参照layui官方文件上传示例 https://www.layui.com/demo/upload.html] <!DOCTYPE html> <html> ...
MySQL基本命令行
登陆:mysql –h localhost –u 用户名 –p mysql –u 用户名 –p (默认连接localhost服务器) 服务器中可以有多个库,库中可以有多个表.数据库的名字无法修改 ...
Logstash配置文件介绍
Logstash配置文件介绍 Logstash配置文件有两种,分别是pipeline配置文件和setting配置文件. Pipeline配置文件主要定义logstash使用的插件以及每个插件的设置,定 ...
.Net MVC个人笔记
关于转向的问题,目前知道的是Response.Redirect 和 location.href 我现在有两个controller,Home和Test <h2>this is Home< ...
Linux命令之如何从普通用户切换至管理员用户
普通用户,标志是一个$符号管理员用户,标志是一个#符号我要切换,敲打命令 sudo su - 然后输入你的管理员用户的密码(输入密码的时候是不可见的) 然后你就切换到#状态了.
14. Spring Boot的 thymleaf公共页抽取
5).CRUD-员工列表实验要求:1).RestfulCRUD:CRUD满足Rest风格:URI: /资源名称/资源标识 HTTP请求方式区分对资源CRUD操作普通CRUD(uri来区分操作) ...
mysql创建数据库指定编码格式
CREATE DATABASE `databasename` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
算法排序【时间复杂度O(n^2)】
排序算法的两个原则: 1.输出结果为递增或者递减. 2.输出结果为原输入结果的排列或者重组. 平均时间复杂度为O(n^2)的排序算法有三种: 冒泡排序,插入排序,选择排序. 一.冒泡排序: 即谁冒泡泡 ...
消息队列介绍和SpringBoot2.x整合RockketMQ、ActiveMQ 9节课
1.JMS介绍和使用场景及基础编程模型简介:讲解什么是小写队列,JMS的基础知识和使用场景 1.什么是JMS: Java消息服务(Java Message Service),Java ...
Weblogic的安装与卸载
一.下载weblogic 到Oracle官网https://www.oracle.com/downloads/index.html,我在这里下载的是weblogic12C进行安装:https://ww ...

随机森林学习-2-sklearn

随机森林学习-2-sklearn的更多相关文章

随机推荐

热门专题