Cluster analysis
https://en.wikipedia.org/wiki/Cluster_analysis
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.
Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances among the cluster members, dense areas of the data space, intervals or particular statistical distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including values such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It is often necessary to modify data preprocessing and model parameters until the result achieves the desired properties.
Besides the term clustering, there are a number of terms with similar meanings, including automatic classification, numerical taxonomy, botryology (from Greek βότρυς "grape") and typological analysis. The subtle differences are often in the usage of the results: while in data mining, the resulting groups are the matter of interest, in automatic classification the resulting discriminative power is of interest.
Cluster analysis was originated in anthropology by Driver and Kroeber in 1932 and introduced to psychology by Zubin in 1938 and Robert Tryon in 1939[1][2] and famously used by Cattell beginning in 1943[3] for trait theory classification in personality psychology.
Cluster analysis的更多相关文章
- cluster analysis in data mining
https://en.wikipedia.org/wiki/K-means_clustering k-means clustering is a method of vector quantizati ...
- UVALive 6906 Cluster Analysis 并查集
Cluster Analysis 题目连接: https://icpcarchive.ecs.baylor.edu/index.php?option=com_onlinejudge&Itemi ...
- UVALive 6906 A - Cluster Analysis
思路:排个序,依次选就好了. #include <bits/stdc++.h> #define PB push_back #define MP make_pair using namesp ...
- K-Means 聚类算法
K-Means 概念定义: K-Means 是一种基于距离的排他的聚类划分方法. 上面的 K-Means 描述中包含了几个概念: 聚类(Clustering):K-Means 是一种聚类分析(Clus ...
- 地理信息系统 - ArcGIS - 高/低聚类分析工具(High/Low Clustering ---Getis-Ord General G)
前段时间在学习空间统计相关的知识,于是把ArcGIS里Spatial Statistics工具箱里的工具好好研究了一遍,同时也整理了一些笔记上传分享.这一篇先聊一些基础概念,工具介绍篇随后上传. 空间 ...
- K-means聚类算法
聚类分析(英语:Cluster analysis,亦称为群集分析) K-means也是聚类算法中最简单的一种了,但是里面包含的思想却是不一般.最早我使用并实现这个算法是在学习韩爷爷那本数据挖掘的书中, ...
- Bioinformatics Glossary
原文:http://homepages.ulb.ac.be/~dgonze/TEACHING/bioinfo_glossary.html Affine gap costs: A scoring sys ...
- (转) Deep Learning Research Review Week 2: Reinforcement Learning
Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...
- Spark入门实战系列--8.Spark MLlib(下)--机器学习库SparkMLlib实战
[注]该系列文章以及使用到安装包/测试数据 可以在<倾情大奉送--Spark入门实战系列>获取 .MLlib实例 1.1 聚类实例 1.1.1 算法说明 聚类(Cluster analys ...
随机推荐
- wp8 入门到精通 WebClient Post
WebClient wc = new WebClient(); var URI = new Uri("http://your_uri_goes_here"); //If any e ...
- 电赛初探(一)——正弦波、方波、锯齿波转换
一.题目要求: 1.使用555做出脉冲方波 2.使用TL084运放做出方波和锯齿波 3.使用TLM314稳压做直流偏置 4.方波要求峰峰值为1V,正弦波要求峰值为0~2V,锯齿波要求峰峰值为1V. 二 ...
- CentOS6.5升级内核到3.10.28 --已验证
本文适用于CentOS 6.4, CentOS 6.5,估计也适用于其他Linux发行版. 1. 准备工作 确认内核及版本信息 [root@hostname ~]# uname -r 2.6.32-2 ...
- 如何查找MySQL中查询慢的SQL语句
如何查找MySQL中查询慢的SQL语句 更多 如何在mysql查找效率慢的SQL语句呢?这可能是困然很多人的一个问题,MySQL通过慢查询日志定位那些执行效率较低的SQL 语句,用--log-slow ...
- SU suamp命令学习
- 17111 Football team
时间限制:1000MS 内存限制:65535K 提交次数:0 通过次数:0 题型: 编程题 语言: C++;C Description As every one known, a footbal ...
- POJ1679 The Unique MST(次小生成树)
可以依次枚举MST上的各条边并删去再求最小生成树,如果结果和第一次求的一样,那就是最小生成树不唯一. 用prim算法,时间复杂度O(n^3). #include<cstdio> #incl ...
- velocity 判断 变量 是否不是空或empty
原先的 #if($mobile) 这种写法是不准确的 ,请换成 "$!{ mobile}"!="" 说明 : #if($mobile) 这种写法 只能 ...
- TYVJ P1034 尼克的任务 Label:倒推dp
背景 题库靠大家,人人都爱它. 描述 尼克每天上班之前都连接上英特网,接收他的上司发来的邮件,这些邮件包含了尼克主管的部门当天要完成的全部任务,每个任务由一个开始时刻与一个持续时间构成.尼克的一个工作 ...
- Hibernate批处理操作优化 (批量插入、更新与删除)
问题描述 我开发的网站加了个新功能:需要在线上处理表数据的批量合并和更新,昨天下午发布上线,执行该功能后,服务器的load突然增高,变化曲线异常,SA教育了我一番,让我尽快处理,将CPU负载降低. 工 ...