机器学习之路: python k近邻分类器 KNeighborsClassifier 鸢尾花分类预测

使用python语言学习k近邻分类器的api

欢迎来到我的git查看源代码: https://github.com/linyi0604/MachineLearning

 from sklearn.datasets import load_iris

 from sklearn.cross_validation import train_test_split

 from sklearn.preprocessing import StandardScaler

 from sklearn.neighbors import KNeighborsClassifier

 from sklearn.metrics import classification_report

 '''

 k近邻分类器

 通过数据的分布对预测数据做出决策

 属于无参数估计的一种

 非常高的计算复杂度和内存消耗

 '''

 '''

 1 准备数据

 '''

 # 读取鸢尾花数据集

 iris = load_iris()

 # 检查数据规模

 # print(iris.data.shape)    # (150, 4)

 # 查看数据说明

 # print(iris.DESCR)

 '''

 Iris Plants Database

 ====================

 Notes

 -----

 Data Set Characteristics:

     :Number of Instances: 150 (50 in each of three classes)

     :Number of Attributes: 4 numeric, predictive attributes and the class

     :Attribute Information:

         - sepal length in cm

         - sepal width in cm

         - petal length in cm

         - petal width in cm

         - class:

                 - Iris-Setosa

                 - Iris-Versicolour

                 - Iris-Virginica

     :Summary Statistics:

     ============== ==== ==== ======= ===== ====================

                     Min  Max   Mean    SD   Class Correlation

     ============== ==== ==== ======= ===== ====================

     sepal length:   4.3  7.9   5.84   0.83    0.7826

     sepal width:    2.0  4.4   3.05   0.43   -0.4194

     petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)

     petal width:    0.1  2.5   1.20  0.76     0.9565  (high!)

     ============== ==== ==== ======= ===== ====================

     :Missing Attribute Values: None

     :Class Distribution: 33.3% for each of 3 classes.

     :Creator: R.A. Fisher

     :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)

     :Date: July, 1988

 This is a copy of UCI ML iris datasets.

 http://archive.ics.uci.edu/ml/datasets/Iris

 The famous Iris database, first used by Sir R.A Fisher

 This is perhaps the best known database to be found in the

 pattern recognition literature.  Fisher's paper is a classic in the field and

 is referenced frequently to this day.  (See Duda & Hart, for example.)  The

 data set contains 3 classes of 50 instances each, where each class refers to a

 type of iris plant.  One class is linearly separable from the other 2; the

 latter are NOT linearly separable from each other.

 References

 ----------

    - Fisher,R.A. "The use of multiple measurements in taxonomic problems"

      Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to

      Mathematical Statistics" (John Wiley, NY, 1950).

    - Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.

      (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.

    - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System

      Structure and Classification Rule for Recognition in Partially Exposed

      Environments".  IEEE Transactions on Pattern Analysis and Machine

      Intelligence, Vol. PAMI-2, No. 1, 67-71.

    - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions

      on Information Theory, May 1972, 431-433.

    - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II

      conceptual clustering system finds 3 classes in the data.

    - Many, many more ...

    共有150个数据样本

    均匀分布在3个亚种上

    每个样本采样4个花瓣、花萼的形状描述

 '''

 '''

 2 划分训练集合和测试集合

 '''

 x_train, x_test, y_train, y_test = train_test_split(iris.data,

                                                     iris.target,

                                                     test_size=0.25,

                                                     random_state=33)

 '''

 3 k近邻分类器 学习模型和预测

 '''

 # 训练数据和测试数据进行标准化

 ss = StandardScaler()

 x_train = ss.fit_transform(x_train)

 x_test = ss.transform(x_test)

 # 建立一个k近邻模型对象

 knc = KNeighborsClassifier()

 # 输入训练数据进行学习建模

 knc.fit(x_train, y_train)

 # 对测试数据进行预测

 y_predict = knc.predict(x_test)

 '''

 4 模型评估

 '''

 print("准确率：", knc.score(x_test, y_test))

 print("其他指标:\n", classification_report(y_test, y_predict, target_names=iris.target_names))

 '''

 准确率： 0.8947368421052632

 其他指标:

               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00         8

  versicolor       0.73      1.00      0.85        11

   virginica       1.00      0.79      0.88        19

 avg / total       0.92      0.89      0.90        38

 '''

机器学习之路: python k近邻分类器 KNeighborsClassifier 鸢尾花分类预测的更多相关文章

机器学习之路: python 线性回归LinearRegression, 随机参数回归SGDRegressor 预测波士顿房价
python3学习使用api 线性回归,和随机参数回归 git: https://github.com/linyi0604/MachineLearning from sklearn.datasets ...
机器学习之路：python k近邻回归预测波士顿房价
python3 学习机器学习api 使用两种k近邻回归模型分别是平均k近邻回归和距离加权k近邻回归进行预测 git: https://github.com/linyi0604/Machine ...
机器学习 —— 基础整理（三）生成式模型的非参数方法： Parzen窗估计、k近邻估计；k近邻分类器
本文简述了以下内容: (一)生成式模型的非参数方法 (二)Parzen窗估计 (三)k近邻估计 (四)k近邻分类器(k-nearest neighbor,kNN) (一)非参数方法(Non-param ...
chapter02 K近邻分类器对Iris数据进行分类预测
寻找与待分类的样本在特征空间中距离最近的K个已知样本作为参考,来帮助进行分类决策. 与其他模型最大的不同在于:该模型没有参数训练过程.无参模型,高计算复杂度和内存消耗. #coding=utf8 # ...
SIGAI机器学习第七集 k近邻算法
讲授K近邻思想,kNN的预测算法,距离函数,距离度量学习,kNN算法的实际应用. KNN是有监督机器学习算法,K-means是一个聚类算法,都依赖于距离函数.没有训练过程,只有预测过程. 大纲: k近 ...
最近邻分类器，K近邻分类器，线性分类器
转自:https://blog.csdn.net/oldmao_2001/article/details/90665515 最近邻分类器: 通俗来讲,计算测试样本与所有样本的距离,将测试样本归为距离最 ...
机器学习——KNN算法（k近邻算法）
一 KNN算法 1. KNN算法简介 KNN(K-Nearest Neighbor)工作原理:存在一个样本数据集合,也称为训练样本集,并且样本集中每个数据都存在标签,即我们知道样本集中每一数据与所属分 ...
机器学习（1）——K近邻算法
KNN的函数写法 import numpy as np from math import sqrt from collections import Counter def KNN_classify(k ...
机器学习小记——KNN（K近邻） ^_^ （一）
为了让绝大多数人都可以看懂,所以我就用简单的话语来讲解机器学习每一个算法第一次写ML的博文,所以可能会有些地方出错,欢迎各位大佬提出意见或错误祝大家开心进步每一天- 博文代码全部为python 简 ...

随机推荐

【leetcode 简单】第八十题 3的幂
给定一个整数,写一个函数来判断它是否是 3 的幂次方. 示例 1: 输入: 27 输出: true 示例 2: 输入: 0 输出: false 示例 3: 输入: 9 输出: true 示例 4: 输 ...
HDU 1010 Tempter of the Bone （广搜+减枝）
题目链接 Problem Description The doggie found a bone in an ancient maze, which fascinated him a lot. How ...
Python练习-三级菜单与"片儿"无关!
# 编辑者:闫龙 #三级目录 menu = { '北京':{ '海淀':{ '五道口':{'soho':{},'网易':{},'google':{}}, '中关村':{'爱奇艺':{},'汽车之家': ...
linux学习记录.4.常用命令
帮助command --help 获取‘command‘命令的帮助目录与文件 cd /home 进入‘home’目录 cd .. 返回上一级目录 cd 进入个人目录 c ...
python面向对象——类
from:http://www.runoob.com/python3/python3-class.html Python3 面向对象 Python从设计之初就已经是一门面向对象的语言,正因为如此,在P ...
haproxy代理https配置方法【转】
记得在之前的一篇文章中介绍了nginx反向代理https的方法,今天这里介绍下haproxy代理https的方法: haproxy代理https有两种方式:1)haproxy服务器本身提供ssl证书, ...
MySQL5.7之多源复制&Nginx中间件（下）【转】
有生之年系列----MySQL5.7之多源复制&Nginx中间件(下)-wangwenan6-ITPUB博客http://blog.itpub.net/29510932/viewspace-1 ...
Linux下配置Samba服务器全过程
Linux下配置Samba服务器全过程 user级别的samba的配置 http://www.linuxidc.com/Linux/2014-11/109234.htm http://www.linu ...
javaweb笔记二
web服务器:实现服务器的开启,监听端口,接收客户端请求,产生响应.响应信息只能是静态的HTML,缺乏灵活性.web容器:是辅助应用的一种方式,是为了解决web服务器缺陷而产生的.可以将请求信息处理完 ...
Baidu软件研发工程师笔试题整理
Hadoop Map/Reduce Hadoop Map/Reduce是一个使用简易的软件框架,基于它写出来的应用程序能够运行在由上千个商用机器组成的大型集群上,并以一种可靠容错的方式并行处理上T级别 ...

机器学习之路: python k近邻分类器 KNeighborsClassifier 鸢尾花分类预测

机器学习之路: python k近邻分类器 KNeighborsClassifier 鸢尾花分类预测的更多相关文章

随机推荐

热门专题