Answer by Richard Willey on 9 Jan 2012

Hi Michael

MATLAB doesn't provide a specific function to remove outliers. In general you have a couple different options to deal with outliers.

1. You can create an index that flags potential outliers and either delete them from your data set or substitute more plausible values

2. You can use robust techniques like robust regression which are less sensitive to the presence of outliers.

Your choice of strategies will depend a lot on your knowledge about the data set. For example, if you have a lot of data points that are coded with a value like -9999 these are probably error codes of some kind rather than actual numeric information.

I'm including some simple example code which shows a standard technique to detect outliers.

=====================
% Create a vector of X values
clear all
clc
hold off
X = 1:100;
X = X';
% Create a noise vector
noise = randn(100,1);
% Create a second noise value where sigma is much larger
noise2 = 10*randn(100,1);
% Substitute noise2 for noise1 at obs# (11, 31, 51, 71, 91)
% Many of these points will have an undue influence on the model
noise(11:20:91) = noise2(11:20:91);
% Specify Y = F(X)
Y = 3*X + 2 + noise;
% Cook's Distance for a given data point measures the extent to
% which a regression model would change if this data point
% were excluded from the regression. Cook's Distance is
% sometimes used to suggest whether a given data point might be an outlier.
% Use regstats to calculate Cook's Distance
stats = regstats(Y,X,'linear');
% if Cook's Distance > n/4 is a typical treshold that is used to suggest
% the presence of an outlier
potential_outlier = stats.cookd > 4/length(X);
% Display the index of potential outliers and graph the results
X(potential_outlier)
scatter(X,Y, 'b.')
hold on
scatter(X(potential_outlier),Y(potential_outlier), 'r.')

MATLAB remove outliers.的更多相关文章

  1. matlab中的containers.Map()

    matlab中的containers.Map() 标签: matlabcontainers.Map容器map 2015-10-27 12:45 1517人阅读 评论(1) 收藏 举报  分类: Mat ...

  2. Taxi Trip Time Winners' Interview: 3rd place, BlueTaxi

    Taxi Trip Time Winners' Interview: 3rd place, BlueTaxi This spring, Kaggle hosted two competitions w ...

  3. 异常值处理outlier

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  4. 壁虎书1 The Machine Learning Landscape

    属性与特征: attribute: e.g., 'Mileage' feature: an attribute plus its value, e.g., 'Mileage = 15000' Note ...

  5. 第三课 创建函数 - 从EXCEL读取 - 导出到EXCEL - 异常值 - Lambda函数 - 切片和骰子数据

    第 3 课   获取数据 - 我们的数据集将包含一个Excel文件,其中包含每天的客户数量.我们将学习如何对 excel 文件进​​行处理.准备数据 - 数据是有重复日期的不规则时间序列.我们将挑战数 ...

  6. Learning Spark中文版--第六章--Spark高级编程(2)

    Working on a Per-Partition Basis(基于分区的操作) 以每个分区为基础处理数据使我们可以避免为每个数据项重做配置工作.如打开数据库连接或者创建随机数生成器这样的操作,我们 ...

  7. Matlab的标记分水岭分割算法

    1 综述 Separating touching objects in an image is one of the more difficult image processing operation ...

  8. Matlab编程基础

    平台:Win7 64 bit,Matlab R2014a(8.3) “Matlab”是“Matrix Laboratory” 的缩写,中文“矩阵实验室”,是强大的数学工具.本文侧重于Matlab的编程 ...

  9. Matlab 进阶学习记录

    最近在看 Faster RCNN的Matlab code,发现很多matlab技巧,在此记录: 1. conf_proposal  =  proposal_config('image_means', ...

随机推荐

  1. php yii .htaccess

    RewriteEngine on # if a directory or a file exists, use it directlyRewriteCond %{REQUEST_FILENAME} ! ...

  2. 使php支持mbstring库

    多国语言并存就意味着多字节,PHP内置的字符串长度函数strlen无法正确处理中文字符串,它得到的只是字符串所占的字节数.对于GB2312的中文编码,strlen得到的值是汉字个数的2倍,而对于UTF ...

  3. 给Java新手的一些建议----Java知识点归纳(Java基础部分)

    写这篇文章的目的是想总结一下自己这么多年来使用java的一些心得体会,主要是和一些java基础知识点相关的,所以也希望能分享给刚刚入门的Java程序员和打算入Java开发这个行当的准新手们,希望可以给 ...

  4. poco网络库分析,教你如何学习使用开源库

    Poco::Net库中有 FTPClient HTML HTTP HTTPClient HTTPServer ICMP Logging Mail Messages NetCore NTP OAuth ...

  5. Linux 中直接 I/O 机制的介绍

    https://www.ibm.com/developerworks/cn/linux/l-cn-directio/ 对于传统的操作系统来说,普通的 I/O 操作一般会被内核缓存,这种 I/O 被称作 ...

  6. 【转】Select模型原理

    Select模型原理利用select函数,判断套接字上是否存在数据,或者能否向一个套接字写入数据.目的是防止应用程序在套接字处于锁定模式时,调用recv(或send)从没有数据的套接字上接收数据,被迫 ...

  7. Canvas处理头像上传

    未分类 最近社区系统需要支持移动端,其中涉及到用户头像上传,头像有大中小三种尺寸,在PC端,社区用Flash来处理头像编辑和生成,但该Flash控件的界面不友好而且移动端对Flash的支持不好,考虑到 ...

  8. .NET Remoting

    .NET Remoting   .NET Remoting是微软早期的分布式通信技术,虽然微软后来通过WCF通用基础通信框架整合掉了,但是通过回顾学习Remoting,反过来学习理解WCF也是很有帮助 ...

  9. Oracle数据库导出

    一. pl/SQL方式 1.打开plsql,找到工具栏,导出表

  10. JAVA深复制(深克隆)与浅复制(浅克隆)

    1.浅复制与深复制概念⑴浅复制(浅克隆)被复制对象的所有变量都含有与原来的对象相同的值,而所有的对其他对象的引用仍然指向原来的对象.换言之,浅复制仅仅复制所考虑的对象,而不 复制它所引用的对象. 1. ...