Answer by Richard Willey on 9 Jan 2012

Hi Michael

MATLAB doesn't provide a specific function to remove outliers. In general you have a couple different options to deal with outliers.

1. You can create an index that flags potential outliers and either delete them from your data set or substitute more plausible values

2. You can use robust techniques like robust regression which are less sensitive to the presence of outliers.

Your choice of strategies will depend a lot on your knowledge about the data set. For example, if you have a lot of data points that are coded with a value like -9999 these are probably error codes of some kind rather than actual numeric information.

I'm including some simple example code which shows a standard technique to detect outliers.

=====================
% Create a vector of X values
clear all
clc
hold off
X = 1:100;
X = X';
% Create a noise vector
noise = randn(100,1);
% Create a second noise value where sigma is much larger
noise2 = 10*randn(100,1);
% Substitute noise2 for noise1 at obs# (11, 31, 51, 71, 91)
% Many of these points will have an undue influence on the model
noise(11:20:91) = noise2(11:20:91);
% Specify Y = F(X)
Y = 3*X + 2 + noise;
% Cook's Distance for a given data point measures the extent to
% which a regression model would change if this data point
% were excluded from the regression. Cook's Distance is
% sometimes used to suggest whether a given data point might be an outlier.
% Use regstats to calculate Cook's Distance
stats = regstats(Y,X,'linear');
% if Cook's Distance > n/4 is a typical treshold that is used to suggest
% the presence of an outlier
potential_outlier = stats.cookd > 4/length(X);
% Display the index of potential outliers and graph the results
X(potential_outlier)
scatter(X,Y, 'b.')
hold on
scatter(X(potential_outlier),Y(potential_outlier), 'r.')

MATLAB remove outliers.的更多相关文章

  1. matlab中的containers.Map()

    matlab中的containers.Map() 标签: matlabcontainers.Map容器map 2015-10-27 12:45 1517人阅读 评论(1) 收藏 举报  分类: Mat ...

  2. Taxi Trip Time Winners' Interview: 3rd place, BlueTaxi

    Taxi Trip Time Winners' Interview: 3rd place, BlueTaxi This spring, Kaggle hosted two competitions w ...

  3. 异常值处理outlier

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  4. 壁虎书1 The Machine Learning Landscape

    属性与特征: attribute: e.g., 'Mileage' feature: an attribute plus its value, e.g., 'Mileage = 15000' Note ...

  5. 第三课 创建函数 - 从EXCEL读取 - 导出到EXCEL - 异常值 - Lambda函数 - 切片和骰子数据

    第 3 课   获取数据 - 我们的数据集将包含一个Excel文件,其中包含每天的客户数量.我们将学习如何对 excel 文件进​​行处理.准备数据 - 数据是有重复日期的不规则时间序列.我们将挑战数 ...

  6. Learning Spark中文版--第六章--Spark高级编程(2)

    Working on a Per-Partition Basis(基于分区的操作) 以每个分区为基础处理数据使我们可以避免为每个数据项重做配置工作.如打开数据库连接或者创建随机数生成器这样的操作,我们 ...

  7. Matlab的标记分水岭分割算法

    1 综述 Separating touching objects in an image is one of the more difficult image processing operation ...

  8. Matlab编程基础

    平台:Win7 64 bit,Matlab R2014a(8.3) “Matlab”是“Matrix Laboratory” 的缩写,中文“矩阵实验室”,是强大的数学工具.本文侧重于Matlab的编程 ...

  9. Matlab 进阶学习记录

    最近在看 Faster RCNN的Matlab code,发现很多matlab技巧,在此记录: 1. conf_proposal  =  proposal_config('image_means', ...

随机推荐

  1. [LA 3887] Slim Span

    3887 - Slim SpanTime limit: 3.000 seconds Given an undirected weighted graph G <tex2html_verbatim ...

  2. Java [Leetcode 219]Contains Duplicate II

    题目描述: Given an array of integers and an integer k, find out whether there are two distinct indices i ...

  3. c & c++中const

    1 const的基本用法 常变量:  const 类型说明符 变量名 #在c语言中,仍是一个变量.c++中则变为一个常量,比如可以设置为数组的长度 常引用:  const 类型说明符 &引用名 ...

  4. HWM的实验

    HWM是数据段中使用空间和未使用空间之间的界限,假如现有自由链表上的数据块不能满足需求,Oracle把HWM指向的数据块加入到自由链表上,HWM向前移动到下一个数据块.简单说,一个数据段中,HWM左边 ...

  5. 【转】This version of the rendering library is more recent than your version of ADT plug-in. Please update ADT plug-in

    原文网址:http://1982106a.blog.163.com/blog/static/8436495620149239361692/ 预览layout.xml文件时提示: This versio ...

  6. MyBatis学习 之 三、动态SQL语句

    目录(?)[-] 三动态SQL语句 selectKey 标签 if标签 if where 的条件判断 if set 的更新语句 if trim代替whereset标签 trim代替set choose ...

  7. web.config配置详细说明

    (一).Web.Config是以XML文件规范存储,配置文件分为以下格式 1.配置节处理程序声明    特点:位于配置文件的顶部,包含在<configSections>标志中. 2.特定应 ...

  8. (原创)LAMP教程1-下载虚拟机软件

    (原创)LAMP教程1 从今天开始会在我的博客更新LAMP教程,第一章节就是安装虚拟机,因为不可能所有的人都有机会操作服务器,所以今天我打算教大家用虚拟机安装配置当下比较流行的框架,lamp. 好了费 ...

  9. Multiple View Geometry in Computer Vision Second Edition by Richard Hartley 读书笔记(二)

    // Chapter 2介绍的是2d下的投影变换,摘录下了以下定理 Result 2.1. The point x lies on the line l if and only if xTl = 0. ...

  10. sql联接那点儿事儿

    1.交叉联接(cross join) select * from t.toy ,b.boy from toy as t cross join boy as b 其效果同select * from t. ...