Weka EM 协方差
Weka EM covariance
description 1:
Dear All,
I am trying to find out what is the real meaning of the minStdDev parameter in the EM clustering algorithm. Can anyone help me?
I have not looked at the code, but I suspect that the minStdDev is used as the first estimate of the covariance of a Gaussian in the mixture model. Am I correct?
I have found the equations or perhaps similar equations to the ones used to calculate the parameters for a Gaussian mixture model in the EM algorithm and there are three, which have these functions:
The first one calculates the probability of each Gaussian.
The second calculates the mean of each Gaussian
The third calculates the covariance matrix of each Gaussian
But this means to start off with there has to be an initial guess at the parameters for the Gaussian mixture model ie the probability or weighting factor for each Gaussian is needed, as is the mean and Covariance matrix.
If I am wrong how is the EM algorithm initiated ie how is the initial guess at the mixture model arrived at? Does minStdDev have any part to play in it? Also is a full covariance matrix calculated in the EM algorithm or are just the standard deviations or variances calculated, ie are right elliptical Gaussians used?
I am guessing that the random number generator is used to pick one or more data points at random as initial values for the means.
This question really follows up on my previous postings about differences between Mac and PC using the EM algorithm and worries about the stability of the algorithm. I was (naively) using the default value of 1.0E-6. However after a reply to a previous posting I have tried scaling the data to be between -1 and +1 and alsozero mean and unit SD. When I try these scaled data sets Mac and PC produce the same result. So I realised that ought to think about the value of minStdDev.
Many thanks for your help in advance.
John Black
description 2:
EM in java is a naive implementation. That is, it treats each
attribute independently of the others given the cluster (much the same
as naive Bayes for classification). Therefore, a full covariance
matrix is not computed, just the means and standard deviations of each
numeric attribute.
The minStdDev parameter is there simply to help prevent numerical
problems. This can be a problem when multiplying large densities
(arising from small standard deviations) when there are many singleton
or near-singleton values. The standard deviation for a given attribute
will not be allowed to be less than the minStdDev value.
EM is initialized with the best result out of 10 executions of
SimpleKMeans (with different seed values).
Hope this helps.
Cheers,
Mark.
Weka EM 协方差的更多相关文章
- Weka:call for the EM algorithm to achieve clustering.(EM算法)
EM算法: 在Eclipse中写出读取文件的代码然后调用EM算法计算输出结果: package EMAlg; import java.io.*; import weka.core.*; import ...
- Weka中EM算法详解
private void EM_Init (Instances inst) throws Exception { int i, j, k; // 由于EM算法对初始值较敏感,故选择run k mean ...
- GMM的EM算法实现
转自:http://blog.csdn.net/abcjennifer/article/details/8198352 在聚类算法K-Means, K-Medoids, GMM, Spectral c ...
- 【EM】代码理解
本来想自己写一个EM算法的,但是操作没两步就进行不下去了.对那些数学公式着实不懂.只好从网上找找代码,看看别人是怎么做的. 代码:来自http://blog.sina.com.cn/s/blog_98 ...
- 高斯混合聚类及EM实现
一.引言 我们谈到了用 k-means 进行聚类的方法,这次我们来说一下另一个很流行的算法:Gaussian Mixture Model (GMM).事实上,GMM 和 k-means 很像,不过 G ...
- [转载]GMM的EM算法实现
在聚类算法K-Means, K-Medoids, GMM, Spectral clustering,Ncut一文中我们给出了GMM算法的基本模型与似然函数,在EM算法原理中对EM算法的实现与收敛性证明 ...
- GMM及EM算法
GMM及EM算法 标签(空格分隔): 机器学习 前言: EM(Exception Maximizition) -- 期望最大化算法,用于含有隐变量的概率模型参数的极大似然估计: GMM(Gaussia ...
- GMM的EM算法
在聚类算法K-Means, K-Medoids, GMM, Spectral clustering,Ncut一文中我们给出了GMM算法的基本模型与似然函数,在EM算法原理中对EM算法的实现与收敛性证明 ...
- EM算法 大白话讲解
假设有一堆数据点,它是由两个线性模型产生的.公式如下: 模型参数为a,b,n:a为线性权值或斜率,b为常数偏置量,n为误差或者噪声. 一方面,假如我们被告知这两个模型的参数,则我们可以计算出损失. 对 ...
随机推荐
- Python argparse
http://songpengfei.iteye.com/blog/1440158 https://docs.python.org/2/library/argparse.html http://sta ...
- 凸优化简介 Convex Optimization Overview
最近的看的一些内容好多涉及到凸优化,没时间系统看了,简单的了解一下,凸优化的两个基本元素分别是凸函数与凸包 凸集 凸集定义如下: 也就是说在凸集内任取两点,其连线上的所有点仍在凸集之内. 凸函数 凸函 ...
- php的webservice的soapheader认证问题
参数通过类传输:class authentication_header { private $username; private $password; public ...
- sql loader
vi append.sh #!/bin/bash sqlldr userid=bm_weihu/itms_xianwan control=input2.ctl vi input2.ctl LOAD D ...
- vhosts.conf
<VirtualHost *:80> ServerAdmin webmaster@dummy-host.example.com DocumentRoot "/opt/lampp/ ...
- Android圆形图片--ImageView
[ RoundImageView.java ] package com.dxd.roundimageview; import android.content.Context; import andro ...
- linux modprobe.conf怎么不见了—-CentOS 6
用习惯了CentOS或RHEL的用户或许很熟悉/etc/modprobe.conf文件,系统声卡.网卡.SCSI卡的驱动类型都在这里定义,同样部分动态加载的模块的参数也可以在这个文件中定义,但是在Ce ...
- c/c++工程中外部头文件及库添加方法
在VS工程中,添加c/c++工程中外部头文件及库的基本步骤: 1.添加工程的头文件目录:工程---属性---配置属性---c/c++---常规---附加包含目录:加上头文件存放目录. 2.添加文件引用 ...
- JavaScript基础知识整理(1)数组
第一:创建. 1,var arr= new Array(); //数组为空.长度为0. arr[0]="apple"; arr[1]="orange"; arr ...
- 创建被访问的swf文件
首先创建一个fla文件,名字叫movie.fla,在该文件库中放一个mc, 并将其拖放到舞台上,然后 命名为test_mc, 然后在库中给该mc绑定一个类,类名随意. 创建访问swf文件的swf文件 ...