scikit-learn工具学习 - random,mgrid,np.r_ ,np.c_, scatter, axis, pcolormesh, contour, decision

yuanwen: http://blog.csdn.net/crossky_jing/article/details/49466127

scikit-learn 练习题
题目：Try classifying classes 1 and 2 from the iris dataset with SVMs, with the 2 first features. Leave out 10% of each class and test prediction performance on these observations.（链接：http://scikit-learn.org/stable/tutorial/statistical_inference/supervised_learning.html）
官方提供的答案如文末代码段
通过这段源代码，我们主要可以学习到如下几个常用函数的使用：

numpy 库

import numpy as np

1、random

用法：产生伪随机数
样例：
np.random.seed(0) //产生以0为种子的伪随机数生成器
order_arr = np.random.permutation(100) //返回100个伪随机数，返回值是一个array

2、mgrid

用法：返回多维结构，常见的如2D图形，3D图形。对比np.meshgrid，在处理大数据时速度更快，且能处理多维（np.meshgrid只能处理2维）
ret = np.mgrid[ 第1维，第2维，第3维， …]
返回多值，以多个矩阵的形式返回，第1返回值为第1维数据在最终结构中的分布，第2返回值为第2维数据在最终结构中的分布，以此类推。（分布以矩阵形式呈现）
例如np.mgrid[X , Y]
样本（i，j）的坐标为（X[i，j] ,Y[i，j]）,X代表第1维，Y代表第2维，在此例中分别为横纵坐标。

例如1D结构（array），如下：

>>> pp = np.mgrid[-5:5:5j]

>>> pp

array([-5. , -2.5,  0. ,  2.5,  5. ])

例如2D结构 (2D矩阵)，如下：

>>> pp = np.mgrid[-1:1:2j,-2:2:3j]

>>> x , y = pp

>>> x

array([[-1., -1., -1.],

       [ 1.,  1.,  1.]])

>>> y

array([[-2.,  0.,  2.],

       [-2.,  0.,  2.]])

例如3D结构 (3D立方体)，如下：

>>> pp = np.mgrid[-1:1:2j,-2:2:3j,-3:3:5j]

>>> print pp

[[[[-1.  -1.  -1.  -1.  -1. ]

   [-1.  -1.  -1.  -1.  -1. ]

   [-1.  -1.  -1.  -1.  -1. ]]

  [[ 1.   1.   1.   1.   1. ]

   [ 1.   1.   1.   1.   1. ]

   [ 1.   1.   1.   1.   1. ]]]

 [[[-2.  -2.  -2.  -2.  -2. ]

   [ 0.   0.   0.   0.   0. ]

   [ 2.   2.   2.   2.   2. ]]

  [[-2.  -2.  -2.  -2.  -2. ]

   [ 0.   0.   0.   0.   0. ]

   [ 2.   2.   2.   2.   2. ]]]

 [[[-3.  -1.5  0.   1.5  3. ]

   [-3.  -1.5  0.   1.5  3. ]

   [-3.  -1.5  0.   1.5  3. ]]

  [[-3.  -1.5  0.   1.5  3. ]

   [-3.  -1.5  0.   1.5  3. ]

   [-3.  -1.5  0.   1.5  3. ]]]]

3、np.r_ , np.c_

用法：concatenation function
np.r_按row来组合array，
np.c_按colunm来组合array

>>> a = np.array([1,2,3])

>>> b = np.array([5,2,5])

>>> //测试 np.r_

>>> np.r_[a,b]

array([1, 2, 3, 5, 2, 5])

>>>

>>> //测试 np.c_

>>> np.c_[a,b]

array([[1, 5],

       [2, 2],

       [3, 5]])

>>> np.c_[a,[0,0,0],b]

array([[1, 0, 5],

       [2, 0, 2],

       [3, 0, 5]])

matplotlib.pyplot 库

import matplotlib.pyplot as plt

1、scatter

用来画散点图的，对样本点着色。如下：X为一个n*2的矩阵，代表n个2维样本点，且每个样本点对应一个label y，用y来对颜色变量c赋值来区分颜色，按照cmap来布局。
plt.scatter(X[:, 0], X[:, 1], c=y, zorder=10, cmap=plt.cm.Paired)

2、axis

用法：设置布局策略
例如： plt.axis(‘tight’) ，表明采用紧致方案，需要将样本的边缘作为画布的边缘。

3、pcolormesh

用法：类似np.pcolor ，是对坐标点着色。
np.pcolormesh(X, Y, C, **kwargs)
例如有样本点（X[i，j] , Y[i，j]），对样本周围（包括样本所在坐标）的四个坐标点进行着色，C代表着色方案，kwargs里可以设置着色配置。

(X[i,   j],   Y[i,   j]),

(X[i,   j+1], Y[i,   j+1]),

(X[i+1, j],   Y[i+1, j]),

(X[i+1, j+1], Y[i+1, j+1]).

样例：plt.pcolormesh(XX, YY, Z>0, cmap=plt.cm.Paired)

4、contour

用法：画轮廓
样例：plt.contour(XX, YY, Z, colors=[‘k’, ‘k’, ‘k’], linestyles=[‘–’, ‘-‘, ‘–’],levels=[-.5, 0, .5])

svm 库

from sklearn import svm

1、decision_function

用法：Distance of the samples X to the separating hyperplane. 即样本点到超平面的距离。
样例：

x_min = X[:, 0].min()

x_max = X[:, 0].max()

y_min = X[:, 1].min()

y_max = X[:, 1].max()

XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]  //分别得到样本第1维和第2维的分布：

Z = clf.decision_function(np.c_[XX.ravel(), YY.ravel()]) //用np.c_()将XX，YY拉平后的两个array按照列合并(此时是n*2的举证，有n个样本点，每个样本点有横纵2维),然后调用分类器集合的decision_function函数获得样本到超平面的距离。Z是一个n*1的矩阵(列向量)，记录了n个样本距离超平面的距离。

附录（完整代码）：

http://scikit-learn.org/stable/_downloads/plot_iris_exercise.py

"""

================================

SVM Exercise

================================

A tutorial exercise for using different SVM kernels.

This exercise is used in the :ref:`using_kernels_tut` part of the

:ref:`supervised_learning_tut` section of the :ref:`stat_learn_tut_index`.

"""

print(__doc__)

import numpy as np

import matplotlib.pyplot as plt

from sklearn import datasets, svm

iris = datasets.load_iris()

X = iris.data

y = iris.target

X = X[y != 0, :2]

y = y[y != 0]

n_sample = len(X)

np.random.seed(0)

order = np.random.permutation(n_sample)

X = X[order]

y = y[order].astype(np.float)

X_train = X[:.9 * n_sample]

y_train = y[:.9 * n_sample]

X_test = X[.9 * n_sample:]

y_test = y[.9 * n_sample:]

# fit the model

for fig_num, kernel in enumerate(('linear', 'rbf', 'poly')):

    clf = svm.SVC(kernel=kernel, gamma=10)

    clf.fit(X_train, y_train)

    plt.figure(fig_num)

    plt.clf()

    plt.scatter(X[:, 0], X[:, 1], c=y, zorder=10, cmap=plt.cm.Paired)

    # Circle out the test data

    plt.scatter(X_test[:, 0], X_test[:, 1], s=80, facecolors='none', zorder=10)

    plt.axis('tight')

    x_min = X[:, 0].min()

    x_max = X[:, 0].max()

    y_min = X[:, 1].min()

    y_max = X[:, 1].max()

    XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]

    Z = clf.decision_function(np.c_[XX.ravel(), YY.ravel()])

    # Put the result into a color plot

    Z = Z.reshape(XX.shape)

    plt.pcolormesh(XX, YY, Z > 0, cmap=plt.cm.Paired)

    plt.contour(XX, YY, Z, colors=['k', 'k', 'k'], linestyles=['--', '-', '--'],

                levels=[-.5, 0, .5])

    plt.title(kernel)

plt.show()