Mahalanobia Distance(马氏距离)的解释
马氏距离有多重定义:
1)可以表示 某一个样本与DataSet的距离。
2)可以表示两个DataSet之间的距离。
1) The Mahalanobis distance of an observation {\displaystyle {\vec {x}}=(x_{1},x_{2},x_{3},\dots ,x_{N})^{T}} from a set of observations with mean {\displaystyle {\vec {\mu }}=(\mu _{1},\mu _{2},\mu _{3},\dots ,\mu _{N})^{T}}
and covariance matrix S is defined as:
Intuitive explanation
Consider the problem of estimating the probability that a test point in N-dimensional Euclidean space belongs to a set, where we are given sample points that definitely belong to that set. Our first step would be to find the average or center of mass of the sample points. Intuitively, the closer the point in question is to this center of mass, the more likely it is to belong to the set.
However, we also need to know if the set is spread out over a large range or a small range, so that we can decide whether a given distance from the center is noteworthy or not. The simplistic approach is to estimate the standard deviation of the distances of the sample points from the center of mass. If the distance between the test point and the center of mass is less than one standard deviation, then we might conclude that it is highly probable that the test point belongs to the set. The further away it is, the more likely that the test point should not be classified as belonging to the set.
This intuitive approach can be made quantitative by defining the normalized distance between the test point and the set to be {\displaystyle {x-\mu } \over \sigma }. By plugging this into the normal distribution we can derive the probability of the test point belonging to the set.
The drawback of the above approach was that we assumed that the sample points are distributed about the center of mass in a spherical(圆) manner. Were the distribution to be decidedly non-spherical, for instance ellipsoidal, then we would expect the probability of the test point belonging to the set to depend not only on the distance from the center of mass, but also on the direction. In those directions where the ellipsoid has a short axis the test point must be closer, while in those where the axis is long the test point can be further away from the center.
Putting this on a mathematical basis, the ellipsoid that best represents the set's probability distribution can be estimated by building the covariance matrix of the samples. The Mahalanobis distance is the distance of the test point from the center of mass divided by the width of the ellipsoid in the direction of the test point.
2)Mahalanobis distance can also be defined as a dissimilarity measure between two random vectors {\displaystyle {\underline {x}}} and {\displaystyle {\underline {y}}}
of the same distribution with the covariance matrix S:
- {\displaystyle d({\vec {x}},{\vec {y}})={\sqrt {({\vec {x}}-{\vec {y}})^{T}S^{-1}({\vec {x}}-{\vec {y}})}}.\,}
If the covariance matrix is the identity matrix, the Mahalanobis distance reduces to the Euclidean distance. If the covariance matrix is diagonal, then the resulting distance measure is called a standardized Euclidean distance:
- {\displaystyle d({\vec {x}},{\vec {y}})={\sqrt {\sum _{i=1}^{N}{(x_{i}-y_{i})^{2} \over s_{i}^{2}}}},}
where si is the standard deviation of the xi and yi over the sample set.
References:
http://people.revoledu.com/kardi/tutorial/Similarity/MahalanobisDistance.html
https://en.wikipedia.org/wiki/Mahalanobis_distance
Mahalanobia Distance(马氏距离)的解释的更多相关文章
- paper 114:Mahalanobis Distance(马氏距离)
(from:http://en.wikipedia.org/wiki/Mahalanobis_distance) Mahalanobis distance In statistics, Mahalan ...
- Mahalanobis Distance(马氏距离)
(from:http://en.wikipedia.org/wiki/Mahalanobis_distance) Mahalanobis distance In statistics, Mahalan ...
- 马氏距离(Mahalanobis distance)
马氏距离(Mahalanobis distance)是由印度统计学家马哈拉诺比斯(P. C. Mahalanobis)提出的,表示数据的协方差距离.它是一种有效的计算两个未知样本集的相似度的方法.与欧 ...
- MATLAB求马氏距离(Mahalanobis distance)
MATLAB求马氏距离(Mahalanobis distance) 作者:凯鲁嘎吉 - 博客园 http://www.cnblogs.com/kailugaji/ 1.马氏距离计算公式 d2(xi, ...
- Mahalanobis距离(马氏距离)的“哲学”解释
讲解教授:赵辉 (FROM : UESTC) 课程:<模式识别> 整理:PO主 基础知识: 假设空间中两点x,y,定义: 欧几里得距离, Mahalanobis距离, 不难发现,如果去掉马 ...
- 有关马氏距离和hinge loss的学习记录
关于度量学习,之前没有看太多相关的文献.不过南京的周老师的一篇NIPS,确实把这个问题剖析得比较清楚. Mahalanobis距离一般表示为d=(x-y)TM(x-y),其中x和y是空间中两个样本点, ...
- 基于欧氏距离和马氏距离的异常点检测—matlab实现
前几天接的一个小项目,基于欧氏距离和马氏距离的异常点检测,已经交接完毕,现在把代码公开. 基于欧式距离的: load data1.txt %导入数据,行为样本,列为特征 X=data1; %赋值给X ...
- Python实现的计算马氏距离算法示例
Python实现的计算马氏距离算法示例 本文实例讲述了Python实现的计算马氏距离算法.分享给大家供大家参考,具体如下: 我给写成函数调用了 python实现马氏距离源代码: # encod ...
- Levenshtein Distance莱文斯坦距离算法来计算字符串的相似度
Levenshtein Distance莱文斯坦距离定义: 数学上,两个字符串a.b之间的莱文斯坦距离表示为levab(|a|, |b|). levab(i, j) = max(i, j) 如果mi ...
随机推荐
- Spring源码窥探之:扩展原理BeanFactoryPostProcessor
BeanPostPorcessor是在bean创建对象初始化前后进行拦截工作,而BeanFactoryPostProcessor是Bean工厂的后置处理器,在Bean定义加载完成之后,Bean实例初始 ...
- 写一段程序,删除字符串a中包含的字符串b,举例 输入a = "asdw",b = "sd" 返回 字符串 “aw”;一个容易被忽略的bug
代码如下: public class test{ public static void main(String args[]){ String test=test("sahsjkshabsh ...
- (a2b_hex)binascii.Error: Non-hexadecimal digit found
HEX_CHAR = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'] 错误:16进制字 ...
- 洛谷 P2571 [SCOI2010]传送带 题解
每日一题 day51 打卡 Analysis 这道题是用非常恶心的三分套三分做的,有一个技巧是不要枚举坐标,枚举两条线段构成三角形的相似比就好了. 了解思路就还挺好写的(尽管我还调了三天) #incl ...
- pgloader 学习(二)特性矩阵&&命令行
pgloader 对于各种数据库支持的还是很完整的,同时有一套自己的dsl 特性矩阵 操作命令 命令格式 pgloader [<options>] [<command-file> ...
- Java 堆栈内存的理解
Java中变量在内存中的分配1). 类变量(static修饰的变量):在程序加载时系统就为它在堆中开辟了内存,堆中的内存地址存放于栈以便高速访问.静态变量的生命周期—一直持续到整个“系统”关闭 2). ...
- mapreduce数据处理——统计排序
接上篇https://www.cnblogs.com/sengzhao666/p/11850849.html 2.数据处理: ·统计最受欢迎的视频/文章的Top10访问次数 (id) ·按照地市统计最 ...
- 生成一个字母数字组合的n位随机码、随机数、随机字符串
package com.cms.util; /** * 生成一个字母数字组合的n位随机码 * @author abc * */ public class CodeUtil { // private f ...
- 【大数据作业九】安装关系型数据库MySQL 安装大数据处理框架Hadoop
作业要求:https://edu.cnblogs.com/campus/gzcc/GZCC-16SE2/homework/3161 4.简述Hadoop平台的起源.发展历史与应用现状. 列举发展过程中 ...
- css3学习之--伪类与圆角
随着css3.0的发布到逐渐完善,目前已经大部分浏览器已经能较好地适配,所以写一些css3的学习经历,分享心得,主要以案例讲解为主,话不多说,今天以css3的新增的“圆角”属性来讲解,基于web画一个 ...