Weka EM 协方差
Weka EM covariance
description 1:
Dear All,
I am trying to find out what is the real meaning of the minStdDev parameter in the EM clustering algorithm. Can anyone help me?
I have not looked at the code, but I suspect that the minStdDev is used as the first estimate of the covariance of a Gaussian in the mixture model. Am I correct?
I have found the equations or perhaps similar equations to the ones used to calculate the parameters for a Gaussian mixture model in the EM algorithm and there are three, which have these functions:
The first one calculates the probability of each Gaussian.
The second calculates the mean of each Gaussian
The third calculates the covariance matrix of each Gaussian
But this means to start off with there has to be an initial guess at the parameters for the Gaussian mixture model ie the probability or weighting factor for each Gaussian is needed, as is the mean and Covariance matrix.
If I am wrong how is the EM algorithm initiated ie how is the initial guess at the mixture model arrived at? Does minStdDev have any part to play in it? Also is a full covariance matrix calculated in the EM algorithm or are just the standard deviations or variances calculated, ie are right elliptical Gaussians used?
I am guessing that the random number generator is used to pick one or more data points at random as initial values for the means.
This question really follows up on my previous postings about differences between Mac and PC using the EM algorithm and worries about the stability of the algorithm. I was (naively) using the default value of 1.0E-6. However after a reply to a previous posting I have tried scaling the data to be between -1 and +1 and alsozero mean and unit SD. When I try these scaled data sets Mac and PC produce the same result. So I realised that ought to think about the value of minStdDev.
Many thanks for your help in advance.
John Black
description 2:
EM in java is a naive implementation. That is, it treats each
attribute independently of the others given the cluster (much the same
as naive Bayes for classification). Therefore, a full covariance
matrix is not computed, just the means and standard deviations of each
numeric attribute.
The minStdDev parameter is there simply to help prevent numerical
problems. This can be a problem when multiplying large densities
(arising from small standard deviations) when there are many singleton
or near-singleton values. The standard deviation for a given attribute
will not be allowed to be less than the minStdDev value.
EM is initialized with the best result out of 10 executions of
SimpleKMeans (with different seed values).
Hope this helps.
Cheers,
Mark.
Weka EM 协方差的更多相关文章
- Weka:call for the EM algorithm to achieve clustering.(EM算法)
EM算法: 在Eclipse中写出读取文件的代码然后调用EM算法计算输出结果: package EMAlg; import java.io.*; import weka.core.*; import ...
- Weka中EM算法详解
private void EM_Init (Instances inst) throws Exception { int i, j, k; // 由于EM算法对初始值较敏感,故选择run k mean ...
- GMM的EM算法实现
转自:http://blog.csdn.net/abcjennifer/article/details/8198352 在聚类算法K-Means, K-Medoids, GMM, Spectral c ...
- 【EM】代码理解
本来想自己写一个EM算法的,但是操作没两步就进行不下去了.对那些数学公式着实不懂.只好从网上找找代码,看看别人是怎么做的. 代码:来自http://blog.sina.com.cn/s/blog_98 ...
- 高斯混合聚类及EM实现
一.引言 我们谈到了用 k-means 进行聚类的方法,这次我们来说一下另一个很流行的算法:Gaussian Mixture Model (GMM).事实上,GMM 和 k-means 很像,不过 G ...
- [转载]GMM的EM算法实现
在聚类算法K-Means, K-Medoids, GMM, Spectral clustering,Ncut一文中我们给出了GMM算法的基本模型与似然函数,在EM算法原理中对EM算法的实现与收敛性证明 ...
- GMM及EM算法
GMM及EM算法 标签(空格分隔): 机器学习 前言: EM(Exception Maximizition) -- 期望最大化算法,用于含有隐变量的概率模型参数的极大似然估计: GMM(Gaussia ...
- GMM的EM算法
在聚类算法K-Means, K-Medoids, GMM, Spectral clustering,Ncut一文中我们给出了GMM算法的基本模型与似然函数,在EM算法原理中对EM算法的实现与收敛性证明 ...
- EM算法 大白话讲解
假设有一堆数据点,它是由两个线性模型产生的.公式如下: 模型参数为a,b,n:a为线性权值或斜率,b为常数偏置量,n为误差或者噪声. 一方面,假如我们被告知这两个模型的参数,则我们可以计算出损失. 对 ...
随机推荐
- SAS Config文件 处理流程
Processing Options Specified by Additional CONFIG Options You can also specify additional –CONFIG op ...
- python tile函数用法
tile函数位于python模块 numpy.lib.shape_base中,他的功能是重复某个数组.比如tile(A,n),功能是将数组A重复n次,构成一个新的数组,我们还是使用具体的例子来说明问题 ...
- DataReader方式 获取数据的操作
一.使用DataReader读取为对象List /// <summary> /// 获得数据列表List<>,DataReader 使用参数的 /// </summary ...
- A woman without arms
任吉美出生在中国烟台海阳一个极为普通的渔民家里.她先天残疾,没有胳膊和手. 小吉美注定要比别人生活得更艰难.她不能自己穿衣,不能自己端碗吃饭,也不能像兄弟姐妹们一样帮助妈妈干家务活,她觉得自己成了家里 ...
- Android数据库升级,数据不丢失解决方案
假设要更新TableC表,建议的做法是: 1) 将TableC重命名为TableC_temp SQL语句可以这样写:ALERT TABLE TableC RENAME TO TableC_temp; ...
- DDoS攻防战(二):CC攻击工具实现与防御理论
我们将要实现一个进行应用层DDoS攻击的工具,综合考虑,CC攻击方式是最佳选择,并用bash shell脚本来快速实现并验证这一工具,并在最后,讨论如何防御来自应用层的DDoS攻击. 第一步:获取大量 ...
- netbeans中wicket插件对应的jQuery-ui版本
在netbean里使用wicket,我们经常习惯使用netbeans自带的wicket插件直接安装wicket,但是因为netbean上的 wicket插件版本比较老,使得我们很多新的第三方wicke ...
- Ubuntu 16.04 Mxnet CPU 版本安装
在安装前配置好更新源,基本要求就是速度越快越好: 1.安装Python apt-get install python 2.安装Git apt-get install git 3.安装依赖包 ...
- python内建函数sorted方法概述
python中,具体到对list进行排序的方法有俩,一个是list自带的sort方法,这个是直接对list进行操作,只有list才包含的方法:另外一个是内建函数sorted方法,可以对所有可迭代的对象 ...
- 我的日常工具——gdb篇
我的日常工具——gdb篇 03 Apr 2014 1.gdb的原理 熟悉linux的同学面试官会问你用过gdb么?那好用过,知道gdb是怎么工作的么?然后直接傻眼... gdb是怎么接管一个进程?并且 ...