DBSCAN——python实现

# -*- coding: utf-8 -*-

from matplotlib.pyplot import *

from collections import defaultdict

import random

import json

"""

    计算两点欧式距离的函数

"""

def dist(p1,p2):

    return ((p1[0] - p2[0]) ** 2 + (p1[1] - p2[1]) ** 2) ** (0.5)

all_points = []

index = 1000

#use python build-in library to load the json file

flickr_data = json.load(file("Paris_points.json"))

for i in range(index):

    Coord = [flickr_data['latitudes'][i],flickr_data['longitudes'][i]]

    all_points.append(Coord)

"""

    设置E和minPts的值

"""

E = 0.001

minPts = 7

"""

    随机产生100个直角坐标，测试用,测试时用E = 8， minPts = 8

"""

#all_points = []

# for i in range(100):

#     randCoord = [random.randint(1,50),random.randint(1,50)]

#     if not randCoord in all_points:

#         all_points.append(randCoord)

"""

    找出核心点

"""

other_points = []

core_points = []

plotted_points = []

for point in all_points:

    point.append(0)    #assign initial level 0,即定义核心点的类型，每个核心点作为一个中心

    total = 0

    for otherPoint in all_points:

        distance = dist(otherPoint,point)

        if distance <= E:

            total += 1

    if total > minPts:

        core_points.append(point)

        plotted_points.append(point)

    else:

        other_points.append(point)

"""

    找到边界点

"""

border_points = []

for core in core_points:

    for other in other_points:

        if dist(core,other) <= E:

            border_points.append(other)

            plotted_points.append(other)

"""

    完成分类的算法，给核心点都贴上标签

"""

cluster_label = 0

for point in core_points:

    if point[2] == 0:

        cluster_label += 1

        point[2] = cluster_label

    for point2 in plotted_points:

        distance = dist(point2, point)

        if point2[2] == 0 and distance <= E:

            #print point,point2

            point2[2] = point[2]

"""

    当所有的点都分配到相应的标签后，我们把同一簇的划分到一起

"""

cluster_list = defaultdict(lambda:[[],[]])

for point in plotted_points:

    cluster_list[point[2]][0].append(point[0])

    cluster_list[point[2]][1].append(point[1])

markers = ['+','*','.','d','^','v','>','<','p']

#markers = ['b.','g.','r.','c.','m.','y.','k.']

"""

    画出所有点的图

"""

figure(1)

allx = []

ally = []

for plot_point in all_points:

    allx.append(plot_point[0])

    ally.append(plot_point[1])

plot(allx, ally,"r.")

title("total points=" + str(len(all_points)) + " E =" + str(E) + " Min Points=" + str(minPts))

"""

    画出核心点的图

"""

figure(2)

i = 0

print cluster_list

for value in cluster_list:

    cluster = cluster_list[value]

    plot(cluster[0],cluster[1],markers[i])

    i = i % 8 + 1

    #i = i % 6 + 1

title(str(len(cluster_list)) + " clusters created with E = "+ str(E) + " Min Points=" + str(minPts))

"""

    画出噪音点的图

"""

figure(3)

noise_points = []

for point in all_points:

    if not point in core_points and not point in border_points:

        noise_points.append(point)

noisex = []

noisey = []

for point in noise_points:

    noisex.append(point[0])

    noisey.append(point[1])

plot(noisex,noisey,"x")

title("noise Points = "+ str(len(noise_points)) + " E ="+str(E)+" Min Points="+str(minPts))

#axis((0,60,0,60))

show()

DBSCAN——python实现的更多相关文章

Python实现DBScan
Python实现DBScan 运行环境 Pyhton3 numpy(科学计算包) matplotlib(画图所需,不画图可不必) 计算过程 st=>start: 开始 e=>end: 结束 ...
（数据科学学习手札15）DBSCAN密度聚类法原理简介&Python与R的实现
DBSCAN算法是一种很典型的密度聚类法,它与K-means等只能对凸样本集进行聚类的算法不同,它也可以处理非凸集. 关于DBSCAN算法的原理,笔者觉得下面这篇写的甚是清楚练达,推荐大家阅读: ht ...
Python机器学习——DBSCAN聚类
密度聚类(Density-based Clustering)假设聚类结构能够通过样本分布的紧密程度来确定.DBSCAN是常用的密度聚类算法,它通过一组邻域参数(ϵϵ,MinPtsMinPts)来描述样 ...
Python实现DBSCAN聚类算法（简单样例测试）
发现高密度的核心样品并从中膨胀团簇. Python代码如下: # -*- coding: utf-8 -*- """ Demo of DBSCAN clustering ...
Python机器学习笔记：K-Means算法，DBSCAN算法
K-Means算法 K-Means 算法是无监督的聚类算法,它实现起来比较简单,聚类效果也不错,因此应用很广泛.K-Means 算法有大量的变体,本文就从最传统的K-Means算法学起,在其基础上学习 ...
挑子学习笔记：DBSCAN算法的python实现
转载请标明出处:https://www.cnblogs.com/tiaozistudy/p/dbscan_algorithm.html DBSCAN(Density-Based Spatial Clu ...
[MCM] K-mean聚类与DBSCAN聚类 Python
import matplotlib.pyplot as plt X=[56.70466067,56.70466067,56.70466067,56.70466067,56.70466067,58.03 ...
吴裕雄 python 机器学习——密度聚类DBSCAN模型
import numpy as np import matplotlib.pyplot as plt from sklearn import cluster from sklearn.metrics ...
DBscan算法及其Python实现
DBSCAN简介: 1.简介 DBSCAN 算法是一种基于密度的空间聚类算法.该算法利用基于密度的聚类的概念,即要求聚类空间中的一定区域内所包含对象(点或其它空间对象)的数目不小于某一给定阀值.DBS ...

随机推荐

XmlSerializer(Type type, Type[] extraTypes) 内存泄漏
在使用XmlSerializer进行序列化或者反序列的时候,对于下面的两个构造方法 XmlSerializer(Type)XmlSerializer.XmlSerializer(Type, Strin ...
新浪微博AppKey大集合(share)
本文转自:http://blog.sina.com.cn/s/blog_9e1ea13a01017y3n.html ------------------------------------------ ...
UML和UP简介（转载）
UML(统一建模语言,Unified Modeling Language)是用于系统的可视化建模语言. UP(统一过程,Unified Process)是通用的软件开发过程. 很多人或书籍过大的夸大 ...
Bootstrap模态框（MVC）
BZ这篇博客主要是为大家介绍一下MVC如何弹出模态框.本文如果有什么不对的地方,希望大神们多多指教,也希望和我一样的小菜多多学习.BZ在这里谢过各位. 首先要在页面加上一个点击事件: @Html.Ac ...
Swift解算法——台阶问题
题目:一个台阶总共有n级,如果一次可以跳1级,也可以跳2级. 求总共有多少总跳法,并分析算法的时间复杂度. 首先对题目进行分析: 台阶一共有n级因此当n = 1时——只有一种跳法当 ...
[NOIP2015] 斗地主（搜索）
题目描述牛牛最近迷上了一种叫斗地主的扑克游戏.斗地主是一种使用黑桃.红心.梅花.方片的A到K加上大小王的共54张牌来进行的扑克牌游戏.在斗地主中,牌的大小关系根据牌的数码表示如下:3<4< ...
Mysql 查看版本号
1.mysql> status; 2.select version(); mysql> select version();+------------+| version() |+----- ...
在as3中使用protobuf
在最近参与开发的adobe air项目中,前后端的通信协议从XML.JSON再到protobuf,最后选择protobuf原因,主要是前后端维护protobuf协议就行了,同时还可以利用IDE做一些编 ...
【OPENGL】第二篇 HELLO OPENGL（续）
上一次我们在这里分析了OpenGL的例子,但是最后还少分析最重要的部分:着色器相关的代码.因此这一次作为前一篇文章的续集. 上一篇文章的地址 http://www.cnblogs.com/MyGame ...
JTree使用
package JTree; import java.awt.Component; import javax.swing.Icon; import javax.swing.JTree; import ...

DBSCAN——python实现

DBSCAN——python实现的更多相关文章

随机推荐

热门专题