HierarchicalClustering：编写HierarchicalClustering层次聚类算法

from numpy import *

class cluster_node:

    def __init__(self,vec,left=None,right=None,distance=0.0,id=None,count=1):

        self.left=left

        self.right=right

        self.vec=vec

        self.id=id

        self.distance=distance

        self.count=count 

def L2dist(v1,v2):

    return sqrt(sum((v1-v2)**2))

def L1dist(v1,v2):

    return sum(abs(v1-v2))

def hcluster(features,distance=L2dist):

    distances={}

    currentclustid=-1 

    clust=[cluster_node(array(features[i]),id=i) for i in range(len(features))]

    while len(clust)>1:

        lowestpair=(0,1)

        closest=distance(clust[0].vec,clust[1].vec)

        for i in range(len(clust)):

            for j in range(i+1,len(clust)):

                if (clust[i].id,clust[j].id) not in distances:

                    distances[(clust[i].id,clust[j].id)]=distance(clust[i].vec,clust[j].vec)

                d=distances[(clust[i].id,clust[j].id)] 

                if d<closest:

                    closest=d

                    lowestpair=(i,j)  

        mergevec=[(clust[lowestpair[0]].vec[i]+clust[lowestpair[1]].vec[i])/2.0 \

            for i in range(len(clust[0].vec))]

        newcluster=cluster_node(array(mergevec),left=clust[lowestpair[0]],

                             right=clust[lowestpair[1]],

                             distance=closest,id=currentclustid)

        currentclustid-=1

        del clust[lowestpair[1]]

        del clust[lowestpair[0]]

        clust.append(newcluster)

    return clust[0]

def extract_clusters(clust,dist):

    clusters = {}

    if clust.distance<dist:

        return [clust]

    else:

        cl = []

        cr = []

        if clust.left!=None:

            cl = extract_clusters(clust.left,dist=dist)

        if clust.right!=None:

            cr = extract_clusters(clust.right,dist=dist)

        return cl+cr  

def get_cluster_elements(clust):

    if clust.id>=0:

        return [clust.id]

    else:

        cl = []

        cr = []

        if clust.left!=None:

            cl = get_cluster_elements(clust.left)

        if clust.right!=None:

            cr = get_cluster_elements(clust.right)

        return cl+cr

def printclust(clust,labels=None,n=0):

    for i in range(n): print (' '),

    if clust.id<0:

        print ('-')

    else:

        if labels==None: print (clust.id)

        else: print (labels[clust.id])

    if clust.left!=None: printclust(clust.left,labels=labels,n=n+1)

    if clust.right!=None: printclust(clust.right,labels=labels,n=n+1)

def getheight(clust):

    if clust.left==None and clust.right==None: return 1

    return getheight(clust.left)+getheight(clust.right)

def getdepth(clust):

    if clust.left==None and clust.right==None: return

    return max(getdepth(clust.left),getdepth(clust.right))+clust.distance

HierarchicalClustering：编写HierarchicalClustering层次聚类算法—Jason niu的更多相关文章

Python爬虫技术(从网页获取图片)+HierarchicalClustering层次聚类算法，实现自动从网页获取图片然后根据图片色调自动分类—Jason niu
网上教程太啰嗦,本人最讨厌一大堆没用的废话,直接上,就是干! 网络爬虫?非监督学习? 只有两步,只有两个步骤? Are you kidding me? Are you ok? 来吧,follow me ...
Hierarchical clustering：利用层次聚类算法来把100张图片自动分成红绿蓝三种色调—Jaosn niu
#!/usr/bin/python # coding:utf-8 from PIL import Image, ImageDraw from HierarchicalClustering import ...
机器学习算法总结(五)——聚类算法（K-means，密度聚类，层次聚类）
本文介绍无监督学习算法,无监督学习是在样本的标签未知的情况下,根据样本的内在规律对样本进行分类,常见的无监督学习就是聚类算法. 在监督学习中我们常根据模型的误差来衡量模型的好坏,通过优化损失函数来改善 ...
【Python机器学习实战】聚类算法（2）——层次聚类(HAC)和DBSCAN
层次聚类和DBSCAN 前面说到K-means聚类算法,K-Means聚类是一种分散性聚类算法,本节主要是基于数据结构的聚类算法--层次聚类和基于密度的聚类算法--DBSCAN两种算法. 1.层次聚类 ...
挑子学习笔记：两步聚类算法（TwoStep Cluster Algorithm）——改进的BIRCH算法
转载请标明出处:http://www.cnblogs.com/tiaozistudy/p/twostep_cluster_algorithm.html 两步聚类算法是在SPSS Modeler中使用的 ...
ROCK 聚类算法‏
ROCK (RObust Clustering using linKs) 聚类算法‏是一种鲁棒的用于分类属性的聚类算法.该算法属于凝聚型的层次聚类算法.之所以鲁棒是因为在确认两对象(样本点/簇)之间 ...
Mahout机器学习平台之聚类算法具体剖析（含实例分析）
第一部分: 学习Mahout必需要知道的资料查找技能: 学会查官方帮助文档: 解压用于安装文件(mahout-distribution-0.6.tar.gz),找到例如以下位置.我将该文件解压到win ...
ML: 聚类算法-概论
聚类分析是一种重要的人类行为,早在孩提时代,一个人就通过不断改进下意识中的聚类模式来学会如何区分猫狗.动物植物.目前在许多领域都得到了广泛的研究和成功的应用,如用于模式识别.数据分析.图像处理.市场研 ...
聚类：层次聚类、基于划分的聚类（k-means）、基于密度的聚类、基于模型的聚类
一.层次聚类 1.层次聚类的原理及分类 1)层次法(Hierarchicalmethods)先计算样本之间的距离.每次将距离最近的点合并到同一个类.然后,再计算类与类之间的距离,将距离最近的类合并为一 ...

随机推荐

PID控制器开发笔记之四：梯形积分PID控制器的实现
从微积分的基本原理看,积分的实现是在无限细分的情况下进行的矩形加和计算.但是在离散状态下,时间间隔已经足够大,矩形积分在某些时候显得精度要低了一些,于是梯形积分被提出来以提升积分精度. 1.梯形积分基 ...
NSLayoutConstraint 使用详解 VFL使用介绍
注意使用前必须先取消所有的你想设置View 的 Autoresizing 属性因为 Autoresizing Layout不能共存系统默认是 Autoresizing for v in su ...
MybatisPlus使用介绍
创建UserController测试类 package com.cppdy.controller; import org.apache.ibatis.session.RowBounds; import ...
LeetCode（109）：有序链表转换二叉搜索树
Medium! 题目描述: 给定一个单链表,其中的元素按升序排序,将其转换为高度平衡的二叉搜索树. 本题中,一个高度平衡二叉树是指一个二叉树每个节点的左右两个子树的高度差的绝对值不超过 1. 示例: ...
centos7安装laravel
一. 安装前准备1. 安装screenyum install screen 2. 安装wgetyum install wget 3. 更新yumyum update 4. 安装额外资源库yum ins ...
PLC漏洞问题
1.PLC采用大多是经过裁剪的实时操作系统,比如像linux RT.QNX.VxWorks等,这些实时操作系统广泛应用在通信.军事.航天.等工程领域,但是随之工业与网络的互连爆发出很多问题,常见的PL ...
python(4): regular expression正则表达式/re库/爬虫基础
python 获取网络数据也很方便抓取 requests 第三方库适合做中小型网络爬虫的开发, 大型的爬虫需要用到 scrapy 框架解析 BeautifulSoup 库, re 模块 (一) r ...
medir设置
setting中 MEDIA_URL="/media/"MEDIA_ROOT=os.path.join(BASE_DIR, "app01","medi ...
Fiddler抓包6-get请求（url详解）
前言上一篇介绍了Composer的功能,可以模拟get和post请求,get请求有些是不带参数的,这种比较容易,直接放到url地址栏就行.有些get请求会带有参数,本篇详细介绍url地址格式. 一. ...
C++ Primer 笔记——转发
某些函数需要将其一个或多个实参连同类型不变的转发给其他函数,这种情况下我们需要保持被转发实参的所有性质,包括实参类型是否是const的以及实参是左值还是右值. template <typenam ...

HierarchicalClustering：编写HierarchicalClustering层次聚类算法—Jason niu

HierarchicalClustering：编写HierarchicalClustering层次聚类算法—Jason niu的更多相关文章

随机推荐

热门专题