聚类之dbscan算法
简要的说明: dbscan为一个密度聚类算法,无需指定聚类个数。
python的简单实例:
# coding:utf-8
from sklearn.cluster import DBSCAN
import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler print '==============================================================='
print 'produce the data'
print '==============================================================='
centers = [[1, 1], [-1, -1], [1, -1]]
X, Y = make_blobs(n_samples=100, centers=centers, cluster_std=0.1,
random_state=0)
X = StandardScaler().fit_transform(X) print '==============================================================='
print 'calc by dbscan'
print '==============================================================='
db = DBSCAN(eps=0.8, min_samples=20).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_ # Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
print('聚类个数: %d' % n_clusters_)
clus = dict();
for label in range(n_clusters_):
print('Cluster ', label, ':')
print(list(X[labels == label].flatten())) # Black removed and is used for noise instead.
unique_labels = set(labels)
colors = plt.cm.Spectral(np.linspace(0, 1, len(unique_labels)))
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = 'k' class_member_mask = (labels == k) xy = X[class_member_mask & core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=14) xy = X[class_member_mask & ~core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=6) plt.title('the number of clusters: %d' % n_clusters_)
plt.show()
结果:
聚类个数: 3
('Cluster ', 0, ':')
[0.71972237193721955, 1.4247928346062555, 0.71030169555602063, 1.4167660110433198, 0.86650601488165513, 1.540511791039243, 0.72211708218565507, 1.3420815507666486, 0.52144250622046129, 1.5915649627099053, 0.71856075881136006, 1.5389120321653047, 0.64879819429817887, 1.2594860931663014, 0.79530587030761835, 1.3059845691292478, 0.90217078085348124, 1.2810513687682608, 0.83428120392822847, 1.5121992651002165, 0.94501772892108737, 1.2304572600393282, 0.61658908505616727, 1.2016210693860701, 0.43123948422351122, 1.4540043441128292, 0.80748270682101664, 1.6223313161580848, 0.80443060148710011, 1.3686384349677738, 0.66615156531279185, 1.4012699966389015, 0.6619526285382874, 1.3526309930197211, 0.8911440824978365, 1.4271253228550598, 0.73656729474920646, 1.2941145631795228, 0.52954661764367772, 1.4337665710281307, 0.63563964407982976, 1.3462216323222238, 0.70021314158580827, 1.4301131836568965, 0.59151066054028689, 1.2340997618614948, 0.60781931318621818, 1.4257196900301823, 0.63157667601940692, 1.3465597131647515, 0.6922193145921226, 1.428232599016918, 0.53128314796952969, 1.3621288955307922, 0.56975224051689699, 1.4671406711851693, 0.6086375682191727, 1.1746304350700796, 0.78429058907277294, 1.3975929004149423, 0.64892102137794172, 1.3382327193866654, 0.75050124858369904, 1.4200749599097495, 0.86238319832692667, 1.3629329516580013, 0.70809022215282358, 1.3648390986044516]
('Cluster ', 1, ':')
[0.63464928652349617, -0.97205337209660403, 0.58556292018246547, -0.73073437840723787, 0.65468651727634131, -0.73441274377141652, 0.63547729042563716, -0.66453211054416861, 0.82977905264905216, -0.7026553048598404, 0.69272708259422322, -0.80662677945376782, 0.9336700453767246, -0.59453052783739546, 0.69594552246388464, -0.55457015205979654, 0.76464102851903226, -0.75835599381130958, 0.76982282906457911, -0.90616105214655729, 0.78543611278256287, -0.64893557021283277, 0.59666018000438314, -0.90008593889031796, 0.56548438993348771, -0.70794621415677228, 0.59303515144236474, -0.66398477418914037, 0.95709744689291321, -0.63610640287638309, 0.82323862006265514, -0.85079072543505374, 0.5630287661735992, -0.7852163996685585, 0.80131670450275849, -0.70246600519558988, 0.7454029815714649, -0.85218313302445714, 0.69903056978268618, -0.86014011002564883, 0.61762634973010477, -0.80939160363205609, 0.5726669483917376, -0.64672353808362981, 0.79449562934102214, -0.80530619881071974, 0.62387498724474699, -0.82390835490887293, 0.75896134677936167, -0.75445848024152995, 0.72097157756491004, -0.66892268630644069, 0.8043594793684804, -0.72698175393472497, 0.66550366099053682, -0.88207692316921515, 0.58097294102170138, -0.78269622047011467, 0.65015889850413455, -0.53164375004590891, 0.62442808457473808, -0.57263307430187604, 0.54434830115223298, -0.68966984891579086, 0.60597037368186768, -0.61780925553487331]
('Cluster ', 2, ':')
[-1.3018531714169292, -0.75534700218006379, -1.5240879328477461, -0.73075767535431713, -1.3558101832440284, -0.69305594070134047, -1.3620120045408117, -0.63846838584413301, -1.501425254649166, -0.75213478312911264, -1.509452276188433, -0.67908018226800171, -1.405243295552026, -0.63269595355922037, -1.3845689200452296, -0.80888897029610984, -1.389466770316631, -0.66133584344003127, -1.4703826992026119, -0.60662876562678392, -1.5515574536964245, -0.64073570728242468, -1.2268148679790234, -0.87919324689086187, -1.5524068961533037, -0.53014962867934823, -1.3956593058608056, -0.59560607016988043, -1.2688843291272613, -0.53521150305805631, -1.4263127243188716, -0.54687874172399775, -1.3060482963082332, -0.86721692570472864, -1.37782320727954, -0.89918162580831196, -1.473789241693936, -0.5401560289692996, -1.2284758921484242, -0.64018233171903494, -1.4714951134839154, -0.81553230478371208, -1.5627790544062625, -0.63346398650999924, -1.4559823420025644, -0.65116763855957849, -1.222575804590778, -0.57926104071118723, -1.5191797025632472, -0.53370819315524121, -1.3873298238589391, -0.85285539988795034, -1.499269551565358, -0.73289080952354146, -1.4606217359962219, -0.73031015933933197, -1.520199373022632, -0.79765210072574655, -1.5415010971493395, -0.62444408930165995, -1.4139110740238496, -0.69363628437548275, -1.3265183485066454, -0.75270484864742238, -1.3497595932847166, -0.72258801674792672]
聚类之dbscan算法的更多相关文章
- 基于密度的聚类之Dbscan算法
一.算法概述 DBSCAN(Density-Based Spatial Clustering of Applications with Noise)是一个比较有代表性的基于密度的聚类算法.与划分和层次 ...
- 机器学习--聚类系列--DBSCAN算法
DBSCAN算法 基本概念:(Density-Based Spatial Clustering of Applications with Noise) 核心对象:若某个点的密度达到算法设定的阈值则其为 ...
- 【转】常用聚类算法(一) DBSCAN算法
原文链接:http://www.cnblogs.com/chaosimple/p/3164775.html#undefined 1.DBSCAN简介 DBSCAN(Density-Based Spat ...
- 常用聚类算法(一) DBSCAN算法
1.DBSCAN简介 DBSCAN(Density-Based Spatial Clustering of Applications with Noise,具有噪声的基于密度的聚类方法)是一种基于密度 ...
- 基于密度聚类的DBSCAN和kmeans算法比较
根据各行业特性,人们提出了多种聚类算法,简单分为:基于层次.划分.密度.图论.网格和模型的几大类. 其中,基于密度的聚类算法以DBSCAN最具有代表性. 场景 一 假设有如下图的一组数据, 生成数据 ...
- 31(1).密度聚类---DBSCAN算法
密度聚类density-based clustering假设聚类结构能够通过样本分布的紧密程度确定. 密度聚类算法从样本的密度的角度来考察样本之间的可连接性,并基于可连接样本的不断扩张聚类簇,从而获得 ...
- 机器学习 - 算法 - 聚类算法 K-MEANS / DBSCAN算法
聚类算法 概述 无监督问题 手中无标签 聚类 将相似的东西分到一组 难点 如何 评估, 如何 调参 基本概念 要得到的簇的个数 - 需要指定 K 值 质心 - 均值, 即向量各维度取平均 距离的度量 ...
- 聚类算法——DBSCAN算法原理及公式
聚类的定义 聚类就是对大量未知标注的数据集,按数据的内在相似性将数据集划分为多个类别,使类别内的数据相似度较大而类别间的数据相似度较小.聚类算法是无监督的算法. 常见的相似度计算方法 闵可夫斯基距离M ...
- DBSCAN算法
简单的说就是根据一个根据对象的密度不断扩展的过程的算法.一个对象O的密度可以用靠近O的对象数来判断.学习DBSCAN算法,需要弄清楚几个概念: 一:基本概念 1.:对象O的是与O为中心,为半径的空间, ...
随机推荐
- Hadoop:Rack Awareness
副本的放置对HDFS可靠性和性能至关重要. 优化副本放置HDFS有别于其他大多数分布式文件系统. 这是一个功能,需要大量的调优和经验. 基于机架感知(rack awareness)的副本放置策略的目的 ...
- springboot整合系列
Spring Boot 系列 博客原文:http://blog.csdn.net/isea533/article/details/50412212 Spring Boot 入门 Spring Boot ...
- .Net 异步方法, await async 使用
最近朋友问起await 和 async第一次听说这个await ,就查了一下这个await使用在于 异步方法async 中,中文意思就是等待,经过一系列的百度参考简单的明白了这个东西的意思, 异步 ...
- ES6之遍历器(Iterator)
什么是Iterator?他是一种接口,为各种不同的数据结构提供统一的访问机制,任何数据结构只要部署上Iterator接口就可以完成遍历操作(PS:个人认为他的这个遍历就是c语言里面的指针),他的作用有 ...
- ES6模板字符串
ES6支持模板字符串,简单写法如下 //html界面 <!DOCTYPE html> <html> <head> <meta charset="ut ...
- ReactNative 基础学习
安卓Back键的处理·基本+高级篇 http://bbs.reactnative.cn/topic/480/%E5%AE%89%E5%8D%93back%E9%94%AE%E7%9A%84%E5%A4 ...
- SharedPreferences 存List集合,模拟数据库,随时存取
PS:SharedPreferences只要稍微学过一点就会用,他本身通过创建一个Editor对象,来存储提交,而editor可以存的格式为 他里面可以存一个Set<String> Set ...
- Mysql数据库重要知识点(知了堂学习心得)
Mysql数据库知识点 1.管理数据库语句: 使用数据库: use test; 添加数据库: create database 数据库名; create database test; 修改数据库: al ...
- 关于vs2010下水晶报表的使用入门
关于vs2010下使用水晶报表了解情况记录如下: 1.首先vs2010不再自带水晶报表控件了,需要下载安装vs2010配套的水晶报表控件:CRforVS_13_0.这个控件安装很简单,基本上都选择默认 ...
- Django学习日记05_模板_模板语言
Variables 在模板中,使用两个大括号包含变量的方式来使用变量: {{ name }} 该变量应该作为键值对中的键,在Context中能被查找到. Tags 模板中使用Tags来进行简单的逻辑: ...