一、numpy

NumPy的主要对象是同种元素的多维数组。这是一个所有的元素都是一种类型、通过一个正整数元组索引的元素表格(通常是元素是数字)。在NumPy中维度(dimensions)叫做轴(axes)，轴的个数叫做秩(rank)。

例如，在3D空间一个点的坐标 [1, 2, 3] 是一个秩为1的数组，因为它只有一个轴。那个轴长度为3.又例如，在以下例子中，数组的秩为2(它有两个维度).第一个维度长度为2,第二个维度长度为3.

[[ 1., 0., 0.],
 [ 0., 1., 2.]]

NumPy的数组类被称作 ndarray 。通常被称作数组。注意numpy.array和标准Python库类array.array并不相同，后者只处理一维数组和提供少量功能。更多重要ndarray对象属性有：

ndarray.ndim

数组轴的个数，在python的世界中，轴的个数被称作秩
ndarray.shape

数组的维度。这是一个指示数组在每个维度上大小的整数元组。例如一个n排m列的矩阵，它的shape属性将是(2,3),这个元组的长度显然是秩，即维度或者ndim属性
ndarray.size

数组元素的总个数，等于shape属性中元组元素的乘积。
ndarray.dtype

一个用来描述数组中元素类型的对象，可以通过创造或指定dtype使用标准Python类型。另外NumPy提供它自己的数据类型。
ndarray.itemsize

数组中每个元素的字节大小。例如，一个元素类型为float64的数组itemsiz属性值为8(=64/8),又如，一个元素类型为complex32的数组item属性为4(=32/8).
ndarray.data

包含实际数组元素的缓冲区，通常我们不需要使用这个属性，因为我们总是通过索引来使用数组中的元素。

>>> from numpy  import *
>>> a = arange(15).reshape(3, 5)
>>> a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
>>> a.dtype.name
'int32'
>>> a.itemsize
4
>>> a.size
15
>>> type(a)
numpy.ndarray
>>> b = array([6, 7, 8])
>>> b
array([6, 7, 8])
>>> type(b)
numpy.ndarray

1、numpy.apply_along_axis

官方文档给的：

numpy.apply_along_axis(func1d, axis, arr, *args, **kwargs)

Apply a function to 1-D slices along the given axis.

Execute func1d(a, *args) where func1d operates on 1-D arrays and a is a 1-D slice of arr along axis.

Parameters:	func1d : function This function should accept 1-D arrays. It is applied to 1-D slices of arr along the specified axis. axis : integer Axis along which arr is sliced. arr : ndarray Input array. args : any Additional arguments to func1d. kwargs : any Additional named arguments to func1d. New in version 1.9.0.
Returns:	apply_along_axis : ndarray The output array. The shape of outarr is identical to the shape of arr, except along the axisdimension. This axis is removed, and replaced with new dimensions equal to the shape of the return value of func1d. So if func1d returns a scalar outarr will have one fewer dimensions than arr.

Parameters:

func1d : function

This function should accept 1-D arrays. It is applied to 1-D slices of arr along the specified axis.

axis : integer

Axis along which arr is sliced.

arr : ndarray

Input array.

args : any

Additional arguments to func1d.

kwargs : any

Additional named arguments to func1d.

New in version 1.9.0.

Returns:

apply_along_axis : ndarray

The output array. The shape of outarr is identical to the shape of arr, except along the axisdimension. This axis is removed, and replaced with new dimensions equal to the shape of the return value of func1d. So if func1d returns a scalar outarr will have one fewer dimensions than arr.

举例：

>>> def my_func(a):#定义了一个my_func()函数，接受一个array的参数
...     """Average first and last element of a 1-D array"""
...     return (a[0] + a[-1]) * 0.5 #返回array的第一个元素和最后一个元素的平均值
>>> b = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> np.apply_along_axis(my_func, 0, b)
array([ 4.,  5.,  6.])
>>> np.apply_along_axis(my_func, 1, b)
array([ 2.,  5.,  8.])

定义了一个my_func()函数，接受一个array的参数，然后返回array的第一个元素和最后一个元素的平均值，生成一个array：

1 2 3
4 5 6
7 8 9

np.apply_along_axis(my_func, 0, b)意思是说把b按列，传给my_func，即求出的是矩阵列元素中第一个和最后一个的平均值，结果为；

4. 5. 6.

np.apply_along_axis(my_func, 1, b)意思是说把b按行，传给my_func，即求出的是矩阵行元素中第一个和最后一个的平均值，结果为；

2. 5. 8.

参考：https://docs.scipy.org/doc/numpy/reference/generated/numpy.apply_along_axis.html

2、numpy.linalg.norm

（1）np.linalg.inv()：矩阵求逆
（2）np.linalg.det()：矩阵求行列式（标量）

np.linalg.norm

顾名思义，linalg=linear+algebra，norm则表示范数，首先需要注意的是范数是对向量（或者矩阵）的度量，是一个标量（scalar）：

首先help(np.linalg.norm)查看其文档：

norm(x, ord=None, axis=None, keepdims=False)

这里我们只对常用设置进行说明，x表示要度量的向量，ord表示范数的种类，

>>> x = np.array([3, 4])
>>> np.linalg.norm(x)
5.
>>> np.linalg.norm(x, ord=2)
5.
>>> np.linalg.norm(x, ord=1)
7.
>>> np.linalg.norm(x, ord=np.inf)
4

范数理论的一个小推论告诉我们：ℓ1≥ℓ2≥ℓ∞

参考：http://blog.csdn.net/lanchunhui/article/details/51004387

3、numpy.expand_dims

主要是把array的维度扩大

numpy.expand_dims(a, axis)

举例：

>>> x = np.array([1,2])
>>> x.shape
(2,)

shape是求矩阵形状的。

>>> y = np.expand_dims(x, axis=0)
>>> y
array([[1, 2]])
>>> y.shape
(1, 2)

维度扩大，axis=0

>>> y = np.expand_dims(x, axis=1)  # Equivalent to x[:,newaxis]
>>> y
array([[1],
       [2]])
>>> y.shape
(2, 1)

维度扩大，axis=1

4、numpy.transpose

矩阵转置操作。

numpy.transpose(a, axes=None)

Permute the dimensions of an array.

Parameters:	a : array_like Input array. axes : list of ints, optional By default, reverse the dimensions, otherwise permute the axes according to the values given.
Returns:	p : ndarray a with its axes permuted. A view is returned whenever possible.

Parameters:

a : array_like

Input array.

axes : list of ints, optional

By default, reverse the dimensions, otherwise permute the axes according to the values given.

Returns:

p : ndarray

a with its axes permuted. A view is returned whenever possible.

举例：

>>> x = np.arange(4).reshape((2,2))
>>> x
array([[0, 1],
       [2, 3]])

>>> np.transpose(x)
array([[0, 2],
       [1, 3]])

>>> x=np.ones((1,2,3))
>>> x
array([[[ 1.,  1.,  1.],
        [ 1.,  1.,  1.]]])
>>> y=np.transpose(x,(1,0,2))
>>> y
array([[[ 1.,  1.,  1.]],
 
       [[ 1.,  1.,  1.]]])
>>> y.shape
(2, 1, 3)
>>>

实际上就是把相应的坐标位置交换。

np.transpose(x,(1,0,2)) ，表示x中坐标的第一个和第二个要互换。比如

array([[[ 1.,  1.,  1.]],
 
       [[ 1.,  1.,  1.]]])中的加粗的1，它的位置是（1,0,1），转换之后就变成了（1,0,2），把它从（1,0,1）这个位置，转移到（1,0,2）
 
看的具体一点：

>>> b = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> b
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
>>> b.shape
(3, 3)
>>> c=np.transpose(b,(1,0))
>>> c
array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])
>>>

这个操作依赖shape，实际上就是相应的坐标换位置，然后在从新放置元素。

二、skelearn

1.pca

1.1、函数原型及参数说明

sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False)

参数说明：

n_components:

意义：PCA算法中所要保留的主成分个数n，也即保留下来的特征个数n

类型：int 或者 string，缺省时默认为None，所有成分被保留。

赋值为int，比如n_components=1，将把原始数据降到一个维度。

赋值为string，比如n_components='mle'，将自动选取特征个数n，使得满足所要求的方差百分比。

copy:

类型：bool，True或者False，缺省时默认为True。

意义：表示是否在运行算法时，将原始训练数据复制一份。若为True，则运行PCA算法后，原始训练数据的值不会有任何改变，因为是在原始数据的副本上进行运算；若为False，则运行PCA算法后，原始训练数据的值会改，因为是在原始数据上进行降维计算。

whiten:

类型：bool，缺省时默认为False

意义：白化，使得每个特征具有相同的方差。关于“白化”，可参考：Ufldl教程

1.2、PCA对象的属性

components_ ：返回具有最大方差的成分。

explained_variance_ratio_：返回所保留的n个成分各自的方差百分比。

n_components_：返回所保留的成分个数n。

mean_：

noise_variance_：

1.3、PCA对象的方法

fit(X,y=None)

fit()可以说是scikit-learn中通用的方法，每个需要训练的算法都会有fit()方法，它其实就是算法中的“训练”这一步骤。因为PCA是无监督学习算法，此处y自然等于None。

fit(X)，表示用数据X来训练PCA模型。

函数返回值：调用fit方法的对象本身。比如pca.fit(X)，表示用X对pca这个对象进行训练。

fit_transform(X)

用X来训练PCA模型，同时返回降维后的数据。

newX=pca.fit_transform(X)，newX就是降维后的数据。

inverse_transform()

将降维后的数据转换成原始数据，X=pca.inverse_transform(newX)

transform(X)

将数据X转换成降维后的数据。当模型训练好后，对于新输入的数据，都可以用transform方法来降维。

此外，还有get_covariance()、get_precision()、get_params(deep=True)、score(X, y=None)等方法，以后用到再补充吧。

1.4、example

以一组二维的数据data为例，data如下，一共12个样本（x,y），其实就是分布在直线y=x上的点，并且聚集在x=1、2、3、4上，各3个。

>>> data
array([[ 1.  ,  1.  ],
       [ 0.9 ,  0.95],
       [ 1.01,  1.03],
       [ 2.  ,  2.  ],
       [ 2.03,  2.06],
       [ 1.98,  1.89],
       [ 3.  ,  3.  ],
       [ 3.03,  3.05],
       [ 2.89,  3.1 ],
       [ 4.  ,  4.  ],
       [ 4.06,  4.02],
       [ 3.97,  4.01]])

data这组数据，有两个特征，因为两个特征是近似相等的，所以用一个特征就能表示了，即可以降到一维。下面就来看看怎么用sklearn中的PCA算法包。

（1）n_components设置为1，copy默认为True，可以看到原始数据data并未改变，newData是一维的，并且明显地将原始数据分成了四类。

>>> from sklearn.decomposition import PCA
>>> pca=PCA(n_components=1)
>>> newData=pca.fit_transform(data)
>>> newData
array([[-2.12015916],
       [-2.22617682],
       [-2.09185561],
       [-0.70594692],
       [-0.64227841],
       [-0.79795758],
       [ 0.70826533],
       [ 0.76485312],
       [ 0.70139695],
       [ 2.12247757],
       [ 2.17900746],
       [ 2.10837406]])
>>> data
array([[ 1.  ,  1.  ],
       [ 0.9 ,  0.95],
       [ 1.01,  1.03],
       [ 2.  ,  2.  ],
       [ 2.03,  2.06],
       [ 1.98,  1.89],
       [ 3.  ,  3.  ],
       [ 3.03,  3.05],
       [ 2.89,  3.1 ],
       [ 4.  ,  4.  ],
       [ 4.06,  4.02],
       [ 3.97,  4.01]])

（2）将copy设置为False，原始数据data将发生改变。

>>> pca=PCA(n_components=1,copy=False)
>>> newData=pca.fit_transform(data)
>>> data
array([[-1.48916667, -1.50916667],
       [-1.58916667, -1.55916667],
       [-1.47916667, -1.47916667],
       [-0.48916667, -0.50916667],
       [-0.45916667, -0.44916667],
       [-0.50916667, -0.61916667],
       [ 0.51083333,  0.49083333],
       [ 0.54083333,  0.54083333],
       [ 0.40083333,  0.59083333],
       [ 1.51083333,  1.49083333],
       [ 1.57083333,  1.51083333],
       [ 1.48083333,  1.50083333]])

（3）n_components设置为'mle'，看看效果，自动降到了1维。

>>> pca=PCA(n_components='mle')
>>> newData=pca.fit_transform(data)
>>> newData
array([[-2.12015916],
       [-2.22617682],
       [-2.09185561],
       [-0.70594692],
       [-0.64227841],
       [-0.79795758],
       [ 0.70826533],
       [ 0.76485312],
       [ 0.70139695],
       [ 2.12247757],
       [ 2.17900746],
       [ 2.10837406]])

（4）对象的属性值

>>> pca.n_components
1
>>> pca.explained_variance_ratio_
array([ 0.99910873])
>>> pca.explained_variance_
array([ 2.55427003])
>>> pca.get_params
<bound method PCA.get_params of PCA(copy=True, n_components=1, whiten=False)>

我们所训练的pca对象的n_components值为1，即保留1个特征，该特征的方差为2.55427003，占所有特征的方差百分比为0.99910873，意味着几乎保留了所有的信息。get_params返回各个参数的值。

（5）对象的方法

>>> newA=pca.transform(A)

对新的数据A，用已训练好的pca模型进行降维。

>>> pca.set_params(copy=False)
PCA(copy=False, n_components=1, whiten=False)

设置参数。

参考：http://doc.okbase.net/u012162613/archive/120946.html

2.svm

经常用到sklearn中的SVC函数，这里把文档中的参数翻译了一些，以备不时之需。

本身这个函数也是基于libsvm实现的，所以在参数设置上有很多相似的地方。（PS: libsvm中的二次规划问题的解决算法是SMO）。
sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True, probability=False,

tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None,random_state=None)

参数：

l C：C-SVC的惩罚参数C?默认值是1.0

C越大，相当于惩罚松弛变量，希望松弛变量接近0，即对误分类的惩罚增大，趋向于对训练集全分对的情况，这样对训练集测试时准确率很高，但泛化能力弱。C值小，对误分类的惩罚减小，允许容错，将他们当成噪声点，泛化能力较强。

l kernel ：核函数，默认是rbf，可以是‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’

　　0 – 线性：u'v

　　 1 – 多项式：(gamma*u'*v + coef0)^degree

　　2 – RBF函数：exp(-gamma|u-v|^2)

　　3 –sigmoid：tanh(gamma*u'*v + coef0)

l degree ：多项式poly函数的维度，默认是3，选择其他核函数时会被忽略。

l gamma ： ‘rbf’,‘poly’ 和‘sigmoid’的核函数参数。默认是’auto’，则会选择1/n_features

l coef0 ：核函数的常数项。对于‘poly’和 ‘sigmoid’有用。

l probability ：是否采用概率估计？.默认为False

l shrinking ：是否采用shrinking heuristic方法，默认为true

l tol ：停止训练的误差值大小，默认为1e-3

l cache_size ：核函数cache缓存大小，默认为200

l class_weight ：类别的权重，字典形式传递。设置第几类的参数C为weight*C(C-SVC中的C)

l verbose ：允许冗余输出？

l max_iter ：最大迭代次数。-1为无限制。

l decision_function_shape ：‘ovo’, ‘ovr’ or None, default=None3

l random_state ：数据洗牌时的种子值，int值

主要调节的参数有：C、kernel、degree、gamma、coef0。

参考：http://blog.csdn.net/szlcw1/article/details/52336824

python中库学习的更多相关文章

python requests库学习笔记（上）
尊重博客园原创精神,请勿转载! requests库官方使用手册地址:http://www.python-requests.org/en/master/:中文使用手册地址:http://cn.pytho ...
python标准库学习-SimpleHTTPServer
这是一个专题记录学习python标准库的笔记及心得简单http服务 SimpleHTTPServer 使用 python -m SimpleHTTPServer 默认启动8000端口源码: &q ...
Python sh库学习上篇
官方文档有句话"allows you to call any program",并且:helps you write shell scripts in Python by givi ...
python中with学习
python中with是非常强大的一个管理器,我个人的理解就是,我们可以通过在我们的类里面自定义enter(self)和exit(self,err_type,err_value,err_tb)这两个内 ...
Python sh库学习
官方文档有句话"allows you to call any program",并且: helps you write shell scripts in Python by giv ...
【python标准库学习】thread，threading(一)多线程的介绍和使用
在单个程序中我们经常用多线程来处理不同的工作,尤其是有的工作需要等,那么我们会新建一个线程去等然后执行某些操作,当做完事后线程退出被回收.当一个程序运行时,就会有一个进程被系统所创建,同时也会有一个线 ...
Python turtle库学习笔记
1.简介 Python的turtle库的易操作,对初学者十分友好.对于初学者来说,刚学编程没多久可以写出许多有趣的可视化东西,这是对学习编程极大的鼓舞,可以树立对编程学习的信心.当然turtle本身也 ...
python requests库学习
Python 第三方 http 库-Requests 学习安装 Requests 1．通过pip安装 $ pip install requests 2．或者,下载代码后安装: $ git clone ...
Python中subprocess学习
subprocess的目的就是启动一个新的进程并且与之通信. subprocess模块中只定义了一个类: Popen.可以使用Popen来创建进程,并与进程进行复杂的交互.它的构造函数如下: subp ...

随机推荐

select选中事件
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/ ...
js例子记载
1.获取项目路径的,不一定有用,仅作参考用: function getRootPath() { var curWwwPath = window.document.location.href; //&q ...
Java堆外内存之二：堆外内存使用总结
目录: <堆外内存操作类ByteBuffer> <DirectBuffer> <Unsafe(java可直接操作内存(),挂起与恢复,CAS操作)> 有时候对内存进 ...
[转] Maven.pom.xml 配置示例
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://mave ...
在WINDOWS任务计划程序下执行PHP文件 PHP定时功能的实现
最近需要做一个定时任务功能,从网站找了很多相关的代码,windows实现方法综合起来大概就两种, 一.使用PHP ignore_user_abort 函数即使关掉浏览器也能正常运行:(个人感觉PHP ...
Linux常见英文报错中文翻译(菜鸟必知)
Linux常见英文报错中文翻译(菜鸟必知) 1.command not found 命令没有找到 2.No such file or directory 没有这个文件或目录 3.Permission ...
kudu架构(转)
特点: High availability(高可用性).Tablet server 和 Master 使用 Raft Consensus Algorithm 来保证节点的高可用,确保只要有一半以上 ...
solr入门之权重排序方法初探之使用edismax改变权重
做搜索引擎避免不了排序问题,当排序没有要求时,solr有自己的排序打分机制及sorce字段 1.无特殊排序要求时,根据查询相关度来进行排序(solr自身规则) 2.当涉及到一个字段来进行相关度排序时, ...
JavaScript中的闭包与匿名函数
知识内容: 1.预备知识 - 函数表达式 2.匿名函数 3.闭包一.函数表达式 1.定义函数的两种方式函数声明: 1 function func(arg0, arg1, arg2){ 2 // 函 ...
Python之函数——进阶篇
嵌套函数 ---函数内部可以再次定义函数 ---函数若想执行,必须被调用注意,下例中,执行结果为什么? age = 19 def func1(): print(age) def func2(): p ...

python中库学习