scikit-learn(project中用的相对较多的模型介绍):2.3. Clustering(可用于特征的无监督降维)
參考:http://scikit-learn.org/stable/modules/clustering.html
在实际项目中,我们真的非常少用到那些简单的模型,比方LR、kNN、NB等。尽管经典,但在project中确实不有用。
今天我们不关注详细的模型,而关注无监督的聚类方法。
之所以关注无监督聚类方法。是由于。在实际项目中,我们除了使用PCA等方法降维外。有时候我们也会考虑使用聚类的方法降维特征。
Overview of clustering methods:
A comparison of the clustering algorithms in scikit-learn
Method name | Parameters | Scalability | Usecase | Geometry (metric used) |
---|---|---|---|---|
K-Means | number of clusters |
Very large n_samples, medium n_clusterswith MiniBatch code |
General-purpose, even cluster size, flat geometry, not too many clusters | Distances between points |
Affinity propagation | damping, sample preference | Not scalable with n_samples | Many clusters, uneven cluster size, non-flat geometry | Graph distance (e.g. nearest-neighbor graph) |
Mean-shift | bandwidth | Not scalable withn_samples | Many clusters, uneven cluster size, non-flat geometry | Distances between points |
Spectral clustering | number of clusters | Medium n_samples, small n_clusters | Few clusters, even cluster size, non-flat geometry | Graph distance (e.g. nearest-neighbor graph) |
Ward hierarchical clustering | number of clusters | Large n_samples andn_clusters | Many clusters, possibly connectivity constraints | Distances between points |
Agglomerative clustering | number of clusters, linkage type, distance | Large n_samples andn_clusters | Many clusters, possibly connectivity constraints, non Euclidean distances | Any pairwise distance |
DBSCAN | neighborhood size | Very large n_samples, medium n_clusters | Non-flat geometry, uneven cluster sizes | Distances between nearest points |
Gaussian mixtures | many | Not scalable | Flat geometry, good for density estimation | Mahalanobis distances to centers |
Birch | branching factor, threshold, optional global clusterer. | Large n_clusters andn_samples | Large dataset, outlier removal, data reduction. |
Euclidean distance between points |
scikit-learn(project中用的相对较多的模型介绍):2.3. Clustering(可用于特征的无监督降维)的更多相关文章
- scikit-learn(project中用的相对较多的模型介绍):1.14. Semi-Supervised
參考:http://scikit-learn.org/stable/modules/label_propagation.html The semi-supervised estimators insk ...
- scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类 (python代码)
scikit learn 模块 调参 pipeline+girdsearch 数据举例:文档分类数据集 fetch_20newsgroups #-*- coding: UTF-8 -*- import ...
- (原创)(三)机器学习笔记之Scikit Learn的线性回归模型初探
一.Scikit Learn中使用estimator三部曲 1. 构造estimator 2. 训练模型:fit 3. 利用模型进行预测:predict 二.模型评价 模型训练好后,度量模型拟合效果的 ...
- (原创)(四)机器学习笔记之Scikit Learn的Logistic回归初探
目录 5.3 使用LogisticRegressionCV进行正则化的 Logistic Regression 参数调优 一.Scikit Learn中有关logistics回归函数的介绍 1. 交叉 ...
- Scikit Learn: 在python中机器学习
转自:http://my.oschina.net/u/175377/blog/84420#OSC_h2_23 Scikit Learn: 在python中机器学习 Warning 警告:有些没能理解的 ...
- Scikit Learn
Scikit Learn Scikit-Learn简称sklearn,基于 Python 语言的,简单高效的数据挖掘和数据分析工具,建立在 NumPy,SciPy 和 matplotlib 上.
- Linear Regression with Scikit Learn
Before you read This is a demo or practice about how to use Simple-Linear-Regression in scikit-lear ...
- 【359】scikit learn 官方帮助文档
官方网站链接 sklearn.neighbors.KNeighborsClassifier sklearn.tree.DecisionTreeClassifier sklearn.naive_baye ...
- 如何使用scikit—learn处理文本数据
答案在这里:http://www.tuicool.com/articles/U3uiiu http://scikit-learn.org/stable/modules/feature_extracti ...
随机推荐
- Flask deployment on gunicorn with flask script
https://stackoverflow.com/questions/34265870/flask-deployment-on-gunicorn-with-flask-script 依赖 Flask ...
- 【06】Vue 之 组件化开发
组件其实就是一个拥有样式.动画.js逻辑.HTML结构的综合块.前端组件化确实让大的前端团队更高效的开发前端项目.而作为前端比较流行的框架之一,Vue的组件和也做的非常彻底,而且有自己的特色.尤其是她 ...
- set基本用法---1
#include<cstdio> #include<iostream> #include<cstdlib> #include<cmath> #inclu ...
- 移动WEB前端开发资源的一些素材
meta篇: <meta name="viewport" content="width=device-width,initial-scale=1.0,user-sc ...
- 十步优化SQL Server中的数据访问
原文发布时间为:2011-02-24 -- 来源于本人的百度文章 [由搬家工具导入] 转载:http://tech.it168.com/a2009/1125/814/000000814758_all. ...
- mvc filters
1.controller using System; using System.Collections.Generic; using System.Linq; using System.Web; us ...
- StringTokenizer:字符串分隔用法简介
StringTokenizer:字符串分隔解析类型 属于:java.util包. 1.构造函数. 1. StringTokenizer(String str) :构造一个用来解析str的StringT ...
- Ubuntu中配置Tomcat与Eclipse整合
Apache Tomcat 作为web服务器已经广泛用于Java Servlets 和 JSP (Java Server Pages) 开发. 环境:Ubuntu10.10 java环境的配置见另一篇 ...
- duilib入门简明教程 -- XML基础类(7) (转)
原文转自:http://www.cnblogs.com/Alberl/p/3343743.html 现在大家应该对XML描述界面不那么陌生了,那么我们做进一步介绍. 前面的教程我们写了很多代码,为的是 ...
- JS和jquery加载的区别
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title> ...