ML | k-means
what's xxx
k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. The problem is computationally difficult (NP-hard)
k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.
Given a set of observations $(x_1, x_2, …, x_n)$, where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k sets (k ≤ n) $S = {S_1, S_2, …, S_k}$ so as to minimize the within-cluster sum of squares 平方和(WCSS):
$\underset{\mathbf{S}} {\operatorname{arg\,min}} \sum_{i=1}^{k} \sum_{\mathbf x_j \in S_i} \left\| \mathbf x_j - \boldsymbol\mu_i \right\|^2 $
where $μ_i$ is the mean of points in $S_i$.
Algorithm
heuristic
1. Assignment step: $S_i^{(t)} = \big \{ x_p : \big \| x_p - m^{(t)}_i \big \|^2 \le \big \| x_p - m^{(t)}_j \big \|^2 \ \forall j, 1 \le j \le k \big\}$,
where each $x_p$ is assigned to exactly one $S^{(t)}$, even if it could be is assigned to two or more of them.
2. Update step: Calculate the new means to be the centroids of the observations in the new clusters.
$m^{(t+1)}_i = \frac{1}{|S^{(t)}_i|} \sum_{x_j \in S^{(t)}_i} x_j $
Since the arithmetic mean is a least-squares estimator, this also minimizes the within-cluster sum of squares (WCSS) objective.
The algorithm has converged when the assignments no longer change. Since both steps optimize the WCSS objective, and there only exists a finite number of such partitionings, the algorithm must converge to a (local) optimum. There is no guarantee that the global optimum is found using this algorithm.
ML | k-means的更多相关文章
- KNN 与 K - Means 算法比较
KNN K-Means 1.分类算法 聚类算法 2.监督学习 非监督学习 3.数据类型:喂给它的数据集是带label的数据,已经是完全正确的数据 喂给它的数据集是无label的数据,是杂乱无章的,经过 ...
- 软件——机器学习与Python,聚类,K——means
K-means是一种聚类算法: 这里运用k-means进行31个城市的分类 城市的数据保存在city.txt文件中,内容如下: BJ,2959.19,730.79,749.41,513.34,467. ...
- 快速查找无序数组中的第K大数?
1.题目分析: 查找无序数组中的第K大数,直观感觉便是先排好序再找到下标为K-1的元素,时间复杂度O(NlgN).在此,我们想探索是否存在时间复杂度 < O(NlgN),而且近似等于O(N)的高 ...
- 网络费用流-最小k路径覆盖
多校联赛第一场(hdu4862) Jump Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Ot ...
- numpy.ones_like(a, dtype=None, order='K', subok=True)返回和原矩阵一样形状的1矩阵
Return an array of ones with the same shape and type as a given array. Parameters: a : array_like Th ...
- Abstractive Summarization
Sequence-to-sequence Framework A Neural Attention Model for Abstractive Sentence Summarization Alexa ...
- R 语言实战-Part 4 笔记
R 语言实战(第二版) part 4 高级方法 -------------第13章 广义线性模型------------------ #前面分析了线性模型中的回归和方差分析,前提都是假设因变量服从正态 ...
- 当我们在谈论kmeans(2)
本稿为初稿,后续可能还会修改:如果转载,请务必保留源地址,非常感谢! 博客园:http://www.cnblogs.com/data-miner/ 其他:建设中- 当我们在谈论kmeans(2 ...
- scikit-learn包的学习资料
http://scikit-learn.org/stable/modules/clustering.html#k-means http://my.oschina.net/u/175377/blog/8 ...
- HDU 3584 Cube (三维 树状数组)
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=3584 Cube Problem Description Given an N*N*N cube A, ...
随机推荐
- 移动端H5前端性能优化指南:
分享地址:https://isux.tencent.com/h5-performance.html
- GoF23种设计模式之结构型模式之外观模式
一.概述 为子系统中的一组接口提供一个一致的界面,外观模式定义了一个高层接口,这个接口使得这一子系统更加容易使用. 二.适用性 1.当你要为一个复杂子系统提供一个简单接口的时候.子系统 ...
- Head First Python (二)
if...else... 1 movies = ["The Holy Grail",1975,"Terry Jones & Terry Gilliam" ...
- LeetCode(258) Add Digits
题目 Given a non-negative integer num, repeatedly add all its digits until the result has only one dig ...
- 《鸟哥的Linux私房菜》学习笔记(1)——文件与目录
在Linux中,任何设备都是文件,不仅如此,连数据通信的接口也有专门的文件负责.可以说,一切皆文件,目录也是一种文件,是路径映射.因此,文件系统是Linux的基础. 一.文件与目录管理命令 1.ls( ...
- flask-mail(qq邮箱)
from flask_mail import Mail,Message app.config['MAIL_SERVER']='smtp.qq.com' app.config['MAIL_PORT'] ...
- HDU 4738 双连通分量 Caocao's Bridges
求权值最小的桥,考虑几种特殊情况: 图本身不连通,那么就不用派人去了 图的边双连通分量只有一个,答案是-1 桥的最小权值是0,但是也要派一个人过去 #include <iostream> ...
- Apache不能启动: Unable to open logs
日志名称: Application来源: Apache Service日期: 2014/3/12 14:43:21事件 ID: ...
- day03_02 Python版本的选择
总结:python2.x是遗产(过时),python3.x是现在和未来的语言 In summary : Python 2.x is legacy, Python 3.x is the present ...
- python-网络编程-02
[1] server端 首先我们看下一个最简单http服务端 import socket def handle_request(client): buf = client.recv(1024) cli ...