用K-近邻算法分类和回归
import numpy as np
from matplotlib import pyplot as plt
X_train = np.array([
[158, 64],
[170, 66],
[183, 84],
[191, 80],
[155, 49],
[163, 59],
[180, 67],
[158, 54],
[178, 77]
])
y_train = ["male", "male", "male", "male", "female", "female", "female", "female", "female"]
plt.figure()
plt.title("Human Heights and Weights by Sex")
plt.xlabel("Height in cm")
plt.ylabel("Weight in kg")
for i, x in enumerate(X_train):
plt.scatter(x[0], x[1], c="k", marker="x" if y_train[i] == "male" else "D")
plt.grid(True)
plt.show()
代码结果:
from collections import Counter
import numpy as np
X_train = np.array([
[158, 64],
[170, 66],
[183, 84],
[191, 80],
[155, 49],
[163, 59],
[180, 67],
[158, 54],
[178, 77]
])
y_train = ["male", "male", "male", "male", "female", "female", "female", "female", "female"]
# 预测数据
x = np.array([[155, 70]])
# 两点之间的距离
distances = np.sqrt(np.sum((X_train - x) ** 2, axis=1))
"""[ 6.70820393 15.5241747 31.30495168 37.36308338 21. 13.60147051 25.17935662 16.2788206 24.04163056]"""
# 距离由小到大的索引排序(取前三个)
nearest_neighbor_indices = distances.argsort()[:3]
"""[0 5 1]"""
# 通过索引,获取y_train中相应的内容
nearest_neighbor_genders = np.take(y_train, nearest_neighbor_indices)
"""['male' 'female' 'male']"""
# 统计nearest_neighbor_genders中每个数据出现的次数
b = Counter(nearest_neighbor_genders)
"""Counter({'male': 2, 'female': 1})"""
# 获取出现次数最多的一个数据(1代表获取一个数据)
gender = b.most_common(1)[0][0]
print(gender)
"""male"""
# 标签二进制化
from sklearn.preprocessing import LabelBinarizer
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
X_train = np.array([
[158, 64],
[170, 66],
[183, 84],
[191, 80],
[155, 49],
[163, 59],
[180, 67],
[158, 54],
[178, 77]
])
y_train = ["male", "male", "male", "male", "female", "female", "female", "female", "female"]
# 预测数据
x = np.array([[155, 70]])
# 实例化标签二进制化
lb = LabelBinarizer()
# 将y_train转化为二进制
y_train_binarized = lb.fit_transform(y_train)
"""[[1] [1] [1] [1] [0] [0] [0] [0] [0]]"""
K = 3
# 实例化KNeighborsClassifier类
clf = KNeighborsClassifier(n_neighbors=K)
# 调用fit方法
clf.fit(X_train, y_train_binarized.reshape(-1))
# 预测x的标签(二进制)
predicted_binarized = clf.predict(x)
"""[1]"""
# 将二进制转换为标签
predicted_label = lb.inverse_transform(predicted_binarized)
print(predicted_label)
"""['male']"""
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, matthews_corrcoef, \
classification_report
from sklearn.preprocessing import LabelBinarizer
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
X_train = np.array([
[158, 64],
[170, 66],
[183, 84],
[191, 80],
[155, 49],
[163, 59],
[180, 67],
[158, 54],
[178, 77]
])
y_train = ["male", "male", "male", "male", "female", "female", "female", "female", "female"]
# 预测数据
x_text = np.array([
[168, 65],
[180, 96],
[160, 52],
[169, 67]
])
y_test = ['female', 'male', 'female', 'female']
# 实例化标签二进制化
lb = LabelBinarizer()
# 将y_train转化为二进制
y_train_binarized = lb.fit_transform(y_train)
"""[[1] [1] [1] [1] [0] [0] [0] [0] [0]]"""
y_test_binarized = lb.transform(y_test)
"""[[0] [1] [0] [0]]"""
K = 3
# 实例化KNeighborsClassifier类
clf = KNeighborsClassifier(n_neighbors=K)
# 调用fit方法
clf.fit(X_train, y_train_binarized.reshape(-1))
# 预测x的标签(二进制)
predicted_binarized = clf.predict(x_text)
"""[1 1 0 0]"""
# 将二进制转换为标签
predicted_label = lb.inverse_transform(predicted_binarized)
print(predicted_label)
"""['male' 'male' 'female' 'female']"""
# 3.5计算准确率
# gender_accuracy_score = accuracy_score(y_test, predicted_label)
gender_accuracy_score = accuracy_score(y_test_binarized, predicted_binarized)
print(gender_accuracy_score)
# 3.6计算精准率(只能使用二进制)
gender_precision_score = precision_score(y_test_binarized, predicted_binarized)
print(gender_precision_score)
# 3.7计算召回率(只能使用二进制)
gender_recall_score = recall_score(y_test_binarized, predicted_binarized)
print(gender_recall_score)
# 3.8计算F1统计变量(精准率和召回率的调和平均值)
gender_f1_score = f1_score(y_test_binarized, predicted_binarized)
print(gender_f1_score)
# 3.9计算马修斯相关系数MCC
gender_mcc_score = matthews_corrcoef(y_test_binarized, predicted_binarized)
print(gender_mcc_score)
# 3.10生成精准率、召回率、F1得分
# gender_report = classification_report(y_test_binarized, predicted_binarized, target_names=["male"], labels=[1])
gender_report = classification_report(y_test_binarized, predicted_binarized)
print(gender_report)
代码结果:
['male' 'male' 'female' 'female']
0.75
0.5
1.0
0.6666666666666666
0.5773502691896258
precision recall f1-score support
0 1.00 0.67 0.80 3
1 0.50 1.00 0.67 1
micro avg 0.75 0.75 0.75 4
macro avg 0.75 0.83 0.73 4
weighted avg 0.88 0.75 0.77 4
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
from sklearn.neighbors import KNeighborsRegressor
import numpy as np
X_train = np.array([
[158, 1],
[170, 1],
[183, 1],
[191, 1],
[155, 0],
[163, 0],
[180, 0],
[158, 0],
[170, 0]
])
y_train = [64, 86, 84, 80, 49, 59, 67, 54, 67]
# 预测数据
x_text = np.array([
[168, 1],
[180, 1],
[160, 0],
[169, 0]
])
y_test = [65, 96, 52, 67]
K = 3
# 实例化KNeighborsRegressor类
clf = KNeighborsRegressor(n_neighbors=K)
# 调用fit方法
clf.fit(X_train, y_train)
# 预测x的体重
predictions = clf.predict(x_text)
print("Predicted wights: %s" % predictions)
# 计算确定系数
wieghts_r2_score = r2_score(y_test, predictions)
print("Coefficient of determination: %s" % wieghts_r2_score)
# 计算平均绝对误差
wieghts_mean_absolute_error = mean_absolute_error(y_test, predictions)
print("Mean absolute error: %s" % wieghts_mean_absolute_error)
# 计算均方误差
wieghts_mean_squared_error = mean_squared_error(y_test, predictions)
print("Mean squared error: %s" % wieghts_mean_squared_error)
代码结果:
Predicted wights: [70.66666667 79. 59. 70.66666667]
Coefficient of determination: 0.6290565226735438
Mean absolute error: 8.333333333333336
Mean squared error: 95.8888888888889
import numpy as np
from scipy.spatial.distance import euclidean
# heights in millimeters
X_train = np.array([
[1700, 1],
[1600, 0]
])
x_test = np.array([1640, 1]).reshape(1, -1)
# 计算欧氏距离(Euclidean Distance)
print(euclidean(X_train[0, :], x_test))
print(euclidean(X_train[1, :], x_test))
# heights in meters
X_train = np.array([
[1.7, 1],
[1.6, 0]
])
x_test = np.array([1.64, 1]).reshape(1, -1)
print(euclidean(X_train[0, :], x_test))
print(euclidean(X_train[1, :], x_test))
代码结果:
60.0
40.01249804748511
0.06000000000000005
1.0007996802557444
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
from sklearn.neighbors import KNeighborsRegressor
import numpy as np
from sklearn.preprocessing import StandardScaler
X_train = np.array([
[158, 1],
[170, 1],
[183, 1],
[191, 1],
[155, 0],
[163, 0],
[180, 0],
[158, 0],
[170, 0]
])
y_train = [64, 86, 84, 80, 49, 59, 67, 54, 67]
# 预测数据
x_test = np.array([
[168, 1],
[180, 1],
[160, 0],
[169, 0]
])
y_test = [65, 96, 52, 67]
# 实例化StandardScaler类
ss = StandardScaler()
# 调用fit_transform方法
X_train_scaled = ss.fit_transform(X_train)
print(X_train)
print(X_train_scaled)
x_test_scaled = ss.fit_transform(x_test)
print(x_test)
print(x_test_scaled)
K = 3
# 实例化KNeighborsRegressor类
clf = KNeighborsRegressor(n_neighbors=K)
# 调用fit方法
clf.fit(X_train_scaled, y_train)
# 预测x的体重
predictions = clf.predict(x_test_scaled)
print("Predicted wights: %s" % predictions)
# 计算确定系数
wieghts_r2_score = r2_score(y_test, predictions)
print("Coefficient of determination: %s" % wieghts_r2_score)
# 计算平均绝对误差
wieghts_mean_absolute_error = mean_absolute_error(y_test, predictions)
print("Mean absolute error: %s" % wieghts_mean_absolute_error)
# 计算均方误差
wieghts_mean_squared_error = mean_squared_error(y_test, predictions)
print("Mean squared error: %s" % wieghts_mean_squared_error)
代码结果:
[[158 1]
[170 1]
[183 1]
[191 1]
[155 0]
[163 0]
[180 0]
[158 0]
[170 0]]
[[-0.9908706 1.11803399]
[ 0.01869567 1.11803399]
[ 1.11239246 1.11803399]
[ 1.78543664 1.11803399]
[-1.24326216 -0.89442719]
[-0.57021798 -0.89442719]
[ 0.86000089 -0.89442719]
[-0.9908706 -0.89442719]
[ 0.01869567 -0.89442719]]
[[168 1]
[180 1]
[160 0]
[169 0]]
[[-0.17557375 1. ]
[ 1.50993422 1. ]
[-1.29924573 -1. ]
[-0.03511475 -1. ]]
Predicted wights: [78. 83.33333333 54. 64.33333333]
Coefficient of determination: 0.6706425961745109
Mean absolute error: 7.583333333333336
Mean squared error: 85.13888888888893
---------------------
用K-近邻算法分类和回归的更多相关文章
- 分类算法----k近邻算法
K最近邻(k-Nearest Neighbor,KNN)分类算法,是一个理论上比较成熟的方法,也是最简单的机器学习算法之一.该方法的思路是:如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的 ...
- 第4章 最基础的分类算法-k近邻算法
思想极度简单 应用数学知识少 效果好(缺点?) 可以解释机器学习算法使用过程中的很多细节问题 更完整的刻画机器学习应用的流程 distances = [] for x_train in X_train ...
- 基本分类方法——KNN(K近邻)算法
在这篇文章 http://www.cnblogs.com/charlesblc/p/6193867.html 讲SVM的过程中,提到了KNN算法.有点熟悉,上网一查,居然就是K近邻算法,机器学习的入门 ...
- 机器学习(四) 机器学习(四) 分类算法--K近邻算法 KNN (下)
六.网格搜索与 K 邻近算法中更多的超参数 七.数据归一化 Feature Scaling 解决方案:将所有的数据映射到同一尺度 八.scikit-learn 中的 Scaler preprocess ...
- 机器学习(四) 分类算法--K近邻算法 KNN (上)
一.K近邻算法基础 KNN------- K近邻算法--------K-Nearest Neighbors 思想极度简单 应用数学知识少 (近乎为零) 效果好(缺点?) 可以解释机器学习算法使用过程中 ...
- 02-19 k近邻算法(鸢尾花分类)
[TOC] 更新.更全的<机器学习>的更新网站,更有python.go.数据结构与算法.爬虫.人工智能教学等着你:https://www.cnblogs.com/nickchen121/ ...
- python 机器学习(二)分类算法-k近邻算法
一.什么是K近邻算法? 定义: 如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于这个类别. 来源: KNN算法最早是由Cover和Hart提 ...
- 机器学习PR:k近邻法分类
k近邻法是一种基本分类与回归方法.本章只讨论k近邻分类,回归方法将在随后专题中进行. 它可以进行多类分类,分类时根据在样本集合中其k个最近邻点的类别,通过多数表决等方式进行预测,因此不具有显式的学习过 ...
- 从K近邻算法谈到KD树、SIFT+BBF算法
转自 http://blog.csdn.net/v_july_v/article/details/8203674 ,感谢july的辛勤劳动 前言 前两日,在微博上说:“到今天为止,我至少亏欠了3篇文章 ...
- k近邻算法
k 近邻算法是一种基本分类与回归方法.我现在只是想讨论分类问题中的k近邻法.k近邻算法的输入为实例的特征向量,对应于特征空间的点,输出的为实例的类别.k邻近法假设给定一个训练数据集,其中实例类别已定. ...
随机推荐
- C# 加密解密类
一. MD5 1 防止看到明文 数据库密码,加盐(原密码+固定字符串,然后再MD5/双MD5) 2 防篡改 3 急速秒传(第一次上传文件,保存md5摘要,第二次上传检查md5摘要) 4文件 ...
- vue双花括号的使用
<!doctype html> <html> <head> <meta charset="UTF-8"> <title> ...
- Redux生态系统
生态系统 Redux 是一个体小精悍的库,但它相关的内容和 API 都是精挑细选的,足以衍生出丰富的工具集和可扩展的生态系统. 如果需要关于 Redux 所有内容的列表,推荐移步至 Awesome R ...
- 004--PowerDesigner设置显示1对多等关系
PowerDesigner设置显示1对多等关系 Step1:双击Reference连接线 Step2:设置Cardinality Step3:显示Cardinality Tools->Displ ...
- charles模拟弱网情况
网络主要需要注意什么场景: 弱网功能测试 无网状态测试 网络切换测试 用户体验关注 下面我们使用charles测试弱网,针对不同网络下的测试 打开charles(抓包软件)
- 普通ACL访问控制列表
配置OSPF R1: R2: R3: R4: 在R1上查看OSPF的学习 测试R1与R4环回接口连通性 配置普通ACL访问控制列表: 先在R4配置密码用R1与R4建立telnet建立 密码huawei ...
- SAP中寻找增强的实现方法(转)
SAP 增强已经发展过几代了,可参考 SAP 标准教材 BC425 和 BC427.简单的说SAP的用户出口总共有四代:1.第一代 基于源代码的增强.SAP提供一个空代码的子过程,在这个子过程中用户 ...
- Linux-第一篇linux基本认识
1.在Linux世界中,一切皆是文件,Linux文件采用级层式的树状目录结构,在此结构中根目录是“/”. 一般linux系统的目录结构如下 目录结构说明 目录 说明 bin 存放二进制可执行文件(ls ...
- UVALive 6270 Edge Case(找规律,大数相加)
版权声明:本文为博主原创文章,未经博主同意不得转载. vasttian https://blog.csdn.net/u012860063/article/details/36905379 转载请注明出 ...
- Vue PC端图片预览插件
*手上的项目刚刚搞完了,记录一下项目中遇到的问题,留做笔记: 需求: 在项目中,需要展示用户上传的一些图片,我从后台接口拿到图片url后放在页面上展示,因为被图片我设置了宽度限制(150px),所以图 ...