K-means algorithm----PRML读书笔记

The K-means algorithm is based on the use of squared Euclidean distance as the measure of dissimilarity between a data point and a prototype vector. Our goal is to partition the data set into some number K of clusters, where we shall suppose for the moment that the value of K is given. We can then define an objective function, sometimes called a distortion measure, given by J=Σ_nΣ_kr_nk||x_n-μ_k||²,where n=1,...N, k=1,...,K, N is observations of a random D-dimensional Euclidean variable x, K is number of clusters. J represents the sum of the squares of the distances of each data point to its assigned vector μ_k. We can think of the μ_k as representing the centres of the clusters. Our goal is to find values for the {r_nk} and the {μ_k} so as to minimize J. First we choose some initial values for the μ_k.Then in the first phase we minimize J with respect to the r_nk, keeping the μ_k fixed. In the second phase we minimize J with respect to μ_k, keeping r_nk fixed. This two-stage optimization is then repeated until convergence. We simply assign the n^th data point to the closest cluster centre, this can be expressed as r_nk=1,if k=argmin_j||x_n-μ_j||², otherwise r_nk=0. The objective function J is a quadratic function of μ_k, and it can be minimized by setting its derivative with respect to μ_kto zero giving 2Σ_nr_nk(x_n-μ_k)=0. μ_k=(Σ_nr_nkx_n)/(Σ_nr_nk), this result has a simple interpretation, namely set μ_k equal to the mean of all of the data points x_n assigned to cluster k. For this reason, the procedure is known as the K-means algorithm.

K-means algorithm----PRML读书笔记的更多相关文章

expectation-maximization algorithm ---- PRML读书笔记
An elegant and powerful method for finding maximum likelihood solutions for models with latent varia ...
PRML读书笔记——2 Probability Distributions
2.1. Binary Variables 1. Bernoulli distribution, p(x = 1|µ) = µ 2.Binomial distribution + 3.beta dis ...
PRML读书笔记——机器学习导论
什么是模式识别(Pattern Recognition)? 按照Bishop的定义,模式识别就是用机器学习的算法从数据中挖掘出有用的pattern. 人们很早就开始学习如何从大量的数据中发现隐藏在背后 ...
PRML读书笔记——3 Linear Models for Regression
Linear Basis Function Models 线性模型的一个关键属性是它是参数的一个线性函数,形式如下: w是参数,x可以是原始的数据,也可以是关于原始数据的一个函数值,这个函数就叫bas ...
PRML读书笔记——Mathematical notation
x, a vector, and all vectors are assumed to be column vectors. M, denote matrices. xT, a row vcetor, ...
【PRML读书笔记-Chapter1-Introduction】1.5 Decision Theory
初体验: 概率论为我们提供了一个衡量和控制不确定性的统一的框架,也就是说计算出了一大堆的概率.那么,如何根据这些计算出的概率得到较好的结果,就是决策论要做的事情. 一个例子: 文中举了一个例子: 给定 ...
PRML读书笔记——Introduction
1.1. Example: Polynomial Curve Fitting 1. Movitate a number of concepts: (1) linear models: Function ...
【PRML读书笔记-Chapter1-Introduction】1.6 Information Theory
熵给定一个离散变量,我们观察它的每一个取值所包含的信息量的大小,因此,我们用来表示信息量的大小,概率分布为.当p(x)=1时,说明这个事件一定会发生,因此,它带给我的信息为0.(因为一定会发生,毫无 ...
【PRML读书笔记-Chapter1-Introduction】1.4 The Curse of Dimensionality
维数灾难给定如下分类问题: 其中x6和x7表示横轴和竖轴(即两个measurements),怎么分? 方法一(simple): 把整个图分成:16个格,当给定一个新的点的时候,就数他所在的格子中,哪 ...
【PRML读书笔记-Chapter1-Introduction】1.3 Model Selection
在训练集上有个好的效果不见得在测试集中效果就好,因为可能存在过拟合(over-fitting)的问题. 如果训练集的数据质量很好,那我们只需对这些有效数据训练处一堆模型,或者对一个模型给定系列的参数值 ...

随机推荐

Linux下的文件结构,及对应文件夹的作用
Linux下的文件结构,及对应文件夹的作用 /bin 二进制可执行命令 /dev 设备特殊文件 /etc 系统管理和配置文件 /etc/rc.d 启动的配置文件和脚本 /home 用户主目录的基点,比 ...
常用MySQL语句整合
常用MySQL语句整合 1. MySQL服务的配置和使用修改MySQL管理员的口令:mysqladmin –u root password 密码字符串如:mysqldmin –u root pas ...
CSS3利用box-shadow实现相框效果
CSS3利用box-shadow实现相框效果 <style> html { overflow: hidden; background-color: #653845; background- ...
node里读取命令行参数
一.process.env process.env属性返回一个包含用户环境信息的对象. 最常见的需求,前端需要根据不同的环境(dev,prd),来调用不同的后端接口.如果用webpack,是这么做的: ...
yagmail邮件模块
昨天接到一个需求,就是要求用邮件发送一html文件.这里我想到了用yagmail #!/usr/bin/env python #-*- coding:utf-8 -*- import yagmail ...
php第十三节课
查询 <?php class DBDA{ public $host = "localhost"; //数据库地址 public $uid = "root" ...
ActiveMQ学习总结（10）——ActiveMQ采用Spring注解方式发送和监听
对于ActiveMQ消息的发送,原声的api操作繁琐,而且如果不进行二次封装,打开关闭会话以及各种创建操作也是够够的了.那么,Spring提供了一个很方便的去收发消息的框架,spring jms.整合 ...
LightOJ 1370 Bi-shoe and Phi-shoe
/* LightOJ 1370 Bi-shoe and Phi-shoe http://lightoj.com/login_main.php?url=volume_showproblem.php?pr ...
hdu 4280
题意:求XY平面上最左边的点到最右边的点的最大流. 分析:数据量大,EK算法TLE,要用SAP算法.SAP算法用的是 http://www.cnblogs.com/kuangbin/archive/2 ...
[bzoj3697]采药人的路径_点分治
采药人的路径 bzoj-3697 题目大意:给你一个n个节点的树,每条边分为阴性和阳性,求满足条件的链的个数,使得这条链上阴性的边的条数等于阳性的边的条数,且这条链上存在一个节点,这个节点到一个端点的 ...

K-means algorithm----PRML读书笔记

K-means algorithm----PRML读书笔记的更多相关文章

随机推荐

热门专题