參考:http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter

三种方法评估模型的预測质量:

最后介绍 Dummy estimators 。提供随机推測的策略,能够作为预測质量评价的baseline。

(參考第六小节)

See also

For “pairwise” metrics, between samples and not estimators or predictions, see the Pairwise
metrics, Affinities and Kernels
 section.

详细内容有时间再写。。

1、

The scoring parameter: defining model evaluation rules

Model selection and evaluation using tools, such as grid_search.GridSearchCV and cross_validation.cross_val_score,
take a scoring parameter
that controls what metric they apply to the estimators evaluated.

1)提前定义的标准

全部的scorer都是越大越好。因此mean_absolute_error and mean_squared_error(測量预測点离模型的距离)是负值。

Scoring Function Comment
Classification    
‘accuracy’ metrics.accuracy_score  
‘average_precision’ metrics.average_precision_score  
‘f1’ metrics.f1_score for binary targets
‘f1_micro’ metrics.f1_score micro-averaged
‘f1_macro’ metrics.f1_score macro-averaged
‘f1_weighted’ metrics.f1_score weighted average
‘f1_samples’ metrics.f1_score by multilabel sample
‘log_loss’ metrics.log_loss requires predict_proba support
‘precision’ etc. metrics.precision_score suffixes apply as with ‘f1’
‘recall’ etc. metrics.recall_score suffixes apply as with ‘f1’
‘roc_auc’ metrics.roc_auc_score  
Clustering    
‘adjusted_rand_score’ metrics.adjusted_rand_score  
Regression    
‘mean_absolute_error’ metrics.mean_absolute_error  
‘mean_squared_error’ metrics.mean_squared_error  
‘median_absolute_error’ metrics.median_absolute_error  
‘r2’ metrics.r2_score  

给个样例:

>>> from sklearn import svm, cross_validation, datasets
>>> iris = datasets.load_iris()
>>> X, y = iris.data, iris.target
>>> model = svm.SVC()
>>> cross_validation.cross_val_score(model, X, y, scoring='wrong_choice')
Traceback (most recent call last):
ValueError: 'wrong_choice' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_rand_score', 'average_precision', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'log_loss', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc']
>>> clf = svm.SVC(probability=True, random_state=0)
>>> cross_validation.cross_val_score(clf, X, y, scoring='log_loss')
array([-0.07..., -0.16..., -0.06...])

3)自己定义scoring标准

following two rules:

  • It can be called with parameters (estimator, X, y),
    where estimator is the model that should be evaluated, X is
    validation data, and y is the ground truth target for X (in
    the supervised case) or None (in the unsupervised case).
  • It returns a floating point number that quantifies the estimator prediction
    quality on X, with reference to y.
    Again, by convention higher numbers are better, so if your scorer returns loss, that value should be negated.


2、

Classification metrics

The sklearn.metrics module
implements several loss, score, and utility functions to measure classification performance.

Some of these are restricted to the binary classification case:

matthews_corrcoef(y_true, y_pred) Compute the Matthews correlation coefficient (MCC) for binary classes
precision_recall_curve(y_true, probas_pred) Compute precision-recall pairs for different probability thresholds
roc_curve(y_true, y_score[, pos_label, ...]) Compute Receiver operating characteristic (ROC)

Others also work in the multiclass case:

confusion_matrix(y_true, y_pred[, labels]) Compute confusion matrix to evaluate the accuracy of a classification
hinge_loss(y_true, pred_decision[, labels, ...]) Average hinge loss (non-regularized)

Some also work in the multilabel case:

accuracy_score(y_true, y_pred[, normalize, ...]) Accuracy classification score.
classification_report(y_true, y_pred[, ...]) Build a text report showing the main classification metrics
f1_score(y_true, y_pred[, labels, ...]) Compute the F1 score, also known as balanced F-score or F-measure
fbeta_score(y_true, y_pred, beta[, labels, ...]) Compute the F-beta score
hamming_loss(y_true, y_pred[, classes]) Compute the average Hamming loss.
jaccard_similarity_score(y_true, y_pred[, ...]) Jaccard similarity coefficient score
log_loss(y_true, y_pred[, eps, normalize, ...]) Log loss, aka logistic loss or cross-entropy loss.
precision_recall_fscore_support(y_true, y_pred) Compute precision, recall, F-measure and support for each class
precision_score(y_true, y_pred[, labels, ...]) Compute the precision
recall_score(y_true, y_pred[, labels, ...]) Compute the recall
zero_one_loss(y_true, y_pred[, normalize, ...]) Zero-one classification loss.

And some work with binary and multilabel (but not multiclass) problems:

average_precision_score(y_true, y_score[, ...]) Compute average precision (AP) from prediction scores
roc_auc_score(y_true, y_score[, average, ...]) Compute Area Under the Curve (AUC) from prediction scores

In the following sub-sections, we will describe each of those functions, preceded by some notes on common API and metric definition.



2)accuracy score:

The accuracy_score function
computes the accuracy,
默认是计算预測正确的比例,假设设置normalize=False。计算预測正确的绝对数量。给个样例就明确:

>>> import numpy as np
>>> from sklearn.metrics import accuracy_score
>>> y_pred = [0, 2, 1, 3]
>>> y_true = [0, 1, 2, 3]
>>> accuracy_score(y_true, y_pred)
0.5
>>> accuracy_score(y_true, y_pred, normalize=False)
2

对于multilabel classification,仅仅有所有的labels所有预測对。该sample才算预測对。

给个样例就明确:

>>> accuracy_score(np.array([[0, 1], [1, 1]]), np.ones((2, 2)))
0.5

再參考:

3)confusion
matrix:

The confusion_matrix function
evaluates classification accuracy by computing the confusion
matrix
. 给个样例:

>>> from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
[0, 0, 1],
[1, 0, 2]])

(注意:纵轴是true label,横轴是predict label)

再參考:

4)classification
report:

The classification_report function
builds a text report showing the main classification metrics. 给个样例:

>>> from sklearn.metrics import classification_report
>>> y_true = [0, 1, 2, 2, 0]
>>> y_pred = [0, 0, 2, 2, 0]
>>> target_names = ['class 0', 'class 1', 'class 2']
>>> print(classification_report(y_true, y_pred, target_names=target_names))
precision recall f1-score support class 0 0.67 1.00 0.80 2
class 1 0.00 0.00 0.00 1
class 2 1.00 1.00 1.00 2 avg / total 0.67 0.80 0.72 5

再參考:

以下的一些不经常使用,简单列出来。不做过多解释和翻译:

5)hamming
loss:

If  is
the predicted value for the -th
label
of a given sample,  is
the corresponding true value, and  is
the number of classes or labels, then the Hamming loss  between
two samples is defined as:

6)jaccard
similarity coefficient score:

The Jaccard similarity coefficient of the -th samples,
with a ground truth label set  and predicted label set ,
is defined as

7)precision、recall、f-measures:

Several functions allow you to analyze the precision, recall and F-measures score:

average_precision_score(y_true, y_score[, ...]) Compute average precision (AP) from prediction scores
f1_score(y_true, y_pred[, labels, ...]) Compute the F1 score, also known as balanced F-score or F-measure
fbeta_score(y_true, y_pred, beta[, labels, ...]) Compute the F-beta score
precision_recall_curve(y_true, probas_pred) Compute precision-recall pairs for different probability thresholds
precision_recall_fscore_support(y_true, y_pred) Compute precision, recall, F-measure and support for each class
precision_score(y_true, y_pred[, labels, ...]) Compute the precision
recall_score(y_true, y_pred[, labels, ...]) Compute the recall

Note that the precision_recall_curve function
is restricted to the binary case. The average_precision_score function
works only in binary classification and multilabel indicator format.

8)hinge loss:

9)log loss:

10)matthews
correlation coefficient:

11)receiver
operating characteristic(ROC):

12)zero one loss:

3、

Multilabel ranking metrics

In multilabel learning, each sample can have any number of ground truth labels associated with it. The goal is to give
high scores and better rank to the ground truth labels.

1)coverage error:

2)label ranking average precision:

4、

Regression metrics

The sklearn.metrics module
implements several loss, score, and utility functions to measure regression performance.

Some of those have been enhanced to handle the multioutput case: mean_absolute_errormean_squared_errormedian_absolute_error and r2_score.

1)explained variance score:

If  is
the estimated target output,  the
corresponding (correct) target output, and  is Variance,
the square of the standard deviation, then the explained variance is estimated as follow:

2)mean absolute error:

If  is
the predicted value of the -th
sample, and  is
the corresponding true value, then the mean absolute error (MAE) estimated over  is
defined as

watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="">

3)mean squared error:

If  is
the predicted value of the -th
sample, and  is
the corresponding true value, then the mean squared error (MSE) estimated over  is
defined as

4)R^2 score、the coefficient of determination:

If  is
the predicted value of the -th
sample and  is
the corresponding true value, then the score R² estimated over  is
defined as

watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="">

5、

Clustering metrics

The sklearn.metrics module
implements several loss, score, and utility functions. For more information see the Clustering
performance evaluation
 section for instance clustering, and Biclustering
evaluation
 for biclustering.

6、Dummy estimators

对于supervised learning。使用随机产生的结果作为baseline是非常easy的对照。

DummyClassifier提供了产生随机结果的简单的策略:

  • stratified generates random predictions by respecting the training set class distribution.

  • most_frequent always predicts the most frequent label in the training set.

  • uniform generates predictions uniformly at random.

  • constant always predicts a constant label that is provided by the user.(A
    major motivation of this method is F1-scoring, when the positive class is in the minority.)

Note that with all these strategies, the predict method completely ignores the input data!

给个简单样例:

first let’s create an imbalanced dataset:

>>>

>>> from sklearn.datasets import load_iris
>>> from sklearn.cross_validation import train_test_split
>>> iris = load_iris()
>>> X, y = iris.data, iris.target
>>> y[y != 1] = -1
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

Next, let’s compare the accuracy of SVC and most_frequent:

>>>

>>> from sklearn.dummy import DummyClassifier
>>> from sklearn.svm import SVC
>>> clf = SVC(kernel='linear', C=1).fit(X_train, y_train)
>>> clf.score(X_test, y_test)
0.63...
>>> clf = DummyClassifier(strategy='most_frequent',random_state=0)
>>> clf.fit(X_train, y_train)
DummyClassifier(constant=None, random_state=0, strategy='most_frequent')
>>> clf.score(X_test, y_test)
0.57...

We see that SVC doesn’t do much better than a dummy classifier. Now, let’s change the kernel:

>>>

>>> clf = SVC(kernel='rbf', C=1).fit(X_train, y_train)
>>> clf.score(X_test, y_test)
0.97...

同理,对于回归问题:

DummyRegressor also
implements four simple rules of thumb for regression:

  • mean always predicts the mean of the training targets.
  • median always predicts the median of the training targets.
  • quantile always predicts a user provided quantile of the training targets.
  • constant always predicts a constant value that is provided by the user.

In all these strategies, the predict method completely ignores the input data.

scikit-learn:3.3. Model evaluation: quantifying the quality of predictions的更多相关文章

  1. Scikit-learn:模型评估Model evaluation

    http://blog.csdn.net/pipisorry/article/details/52250760 模型评估Model evaluation: quantifying the qualit ...

  2. Scikit-learn:模型评估Model evaluation 之绘图

    http://blog.csdn.net/pipisorry/article/details/53001866 绘制ROC曲线 def plotRUC(yt, ys, title=None): ''' ...

  3. scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类 (python代码)

    scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类数据集 fetch_20newsgroups #-*- coding: UTF-8 -*- import ...

  4. (原创)(三)机器学习笔记之Scikit Learn的线性回归模型初探

    一.Scikit Learn中使用estimator三部曲 1. 构造estimator 2. 训练模型:fit 3. 利用模型进行预测:predict 二.模型评价 模型训练好后,度量模型拟合效果的 ...

  5. (原创)(四)机器学习笔记之Scikit Learn的Logistic回归初探

    目录 5.3 使用LogisticRegressionCV进行正则化的 Logistic Regression 参数调优 一.Scikit Learn中有关logistics回归函数的介绍 1. 交叉 ...

  6. Scikit Learn: 在python中机器学习

    转自:http://my.oschina.net/u/175377/blog/84420#OSC_h2_23 Scikit Learn: 在python中机器学习 Warning 警告:有些没能理解的 ...

  7. 懒人小工具:自动生成Model,Insert,Select,Delete以及导出Excel的方法

    在开发的过程中,我们为了节约时间,往往会将大量重复机械的代码封装,考虑代码的复用性,这样我们可以节约很多时间来做别的事情.最近跳槽到一节webform开发的公司,主要是开发自己公司用的ERP.开始因为 ...

  8. JS--bom对象:borswer object model浏览器对象模型

    bom对象:borswer object model浏览器对象模型 navigator获取客户机的信息(浏览器的信息) navigator.appName;获得浏览器的名称 window:窗口对象 a ...

  9. JS--dom对象:document object model文档对象模型

    dom对象:document object model文档对象模型 文档:超文本标记文档 html xml 对象:提供了属性和方法 模型:使用属性和方法操作超文本标记性文档 可以使用js里面的DOM提 ...

随机推荐

  1. css 动态导入css文件 @import 动态js加载 都是静态的

    @import "http://apps.bdimg.com/libs/bootstrap/3.3.4/css/bootstrap.css" /*-防止各大cdn公共库加载地址失效 ...

  2. tensorflow ConfigProto

    tf.ConfigProto一般用在创建session的时候.用来对session进行参数配置 with tf.Session(config = tf.ConfigProto(...),...)#tf ...

  3. 在 VS2015+EF6.0中使用Mysql 遇到的坑

    1)首先是要在vs2015中安装mysql Database 默认是不存在的 1)下载mysql-connector-net-6.9.9.msi    地址:https://dev.mysql.com ...

  4. python note of decorator

    def decorate_log(decorate_arg,*args,**kwargs): # 存放装饰器参数 def decorate_wrapper(func,*args,**kwargs): ...

  5. 笔试算法题(07):还原后序遍历数组 & 半翻转英文句段

    出题:输入一个整数数组,判断该数组是否符合一个二元查找树的后序遍历(给定整数数组,判定其是否满足某二元查找树的后序遍历): 分析:利用后序遍历对应到二元查找树的性质(序列最后一个元素必定是根节点,从左 ...

  6. 安装Vmware Tools出现错误

    安装Vmware Tools出现: Before you can compile modules, you need to have the following installed... makegc ...

  7. JavaScript在HTML中的应用

    JavaScript在HTML中的应用 制作人:全心全意 在HTML文档中可以使用<script>...</script>标记将JavaScript脚本嵌入到其中,在HTML文 ...

  8. css布局的各种FC简单介绍:BFC,IFC,GFC,FFC

    什么是FC? Formatting Context,格式化上下文,指页面中一个渲染区域,拥有一套渲染规则,它决定了其子元素如何定位,以及与其他元素的相互关系和作用. BFC 什么是BFC Block ...

  9. 语法,if,while循环,for循环

    目录 一.语法 二.while循环 三.for循环 一.语法 if: if判断其实是在模拟人做判断.就是说如果这样干什么,如果那样干什么.对于ATM系统而言,则需要判断你的账号密码的正确性. if 条 ...

  10. odoo 权限配置讲解

    今天来讲解一下odoo权限配置的简单讲解,配合公司开发的权限模块的使用,进行odoo权限配置的说明 BaseSecurityExtend 2.0模块 这是公司自主开发的一款针对odoo菜单级别进行可视 ...