Z-score

import numpy as np

def outliers_z_score(ys):
threshold = 3 mean_y = np.mean(ys)
stdev_y = np.std(ys)
z_scores = [(y - mean_y) / stdev_y for y in ys]
return np.where(np.abs(z_scores) > threshold)

Modified Z-score

import numpy as np

def outliers_modified_z_score(ys):
threshold = 3.5 median_y = np.median(ys)
median_absolute_deviation_y = np.median([np.abs(y - median_y) for y in ys])
modified_z_scores = [0.6745 * (y - median_y) / median_absolute_deviation_y
for y in ys]
return np.where(np.abs(modified_z_scores) > threshold)

IQR(interquartile range)

import numpy as np

def outliers_iqr(ys):
quartile_1, quartile_3 = np.percentile(ys, [25, 75])
iqr = quartile_3 - quartile_1
lower_bound = quartile_1 - (iqr * 1.5)
upper_bound = quartile_3 + (iqr * 1.5)
return np.where((ys > upper_bound) | (ys < lower_bound))

Conclusion

It is important to reiterate that these methods should not be used mechanically.
They should be used to explore the data – they let you know which points might be worth a closer look.
What to do with this information depends heavily on the situation.
Sometimes it is appropriate to exclude outliers from a dataset to make a model trained on that dataset more predictive.
Sometimes, however,
the presence of outliers is a warning sign that the real-world process generating the data is more complicated than expected. As an astute commenter on CrossValidated put it:
“Sometimes outliers are bad data, and should be excluded, such as typos.
Sometimes they are Wayne Gretzky or Michael Jordan, and should be kept.” Domain knowledge and practical wisdom are the only ways to tell the difference.

  

摘自:http://colingorrie.github.io/outlier-detection.html

Three ways to detect outliers的更多相关文章

  1. MATLAB remove outliers.

    Answer by Richard Willey on 9 Jan 2012 Hi Michael MATLAB doesn't provide a specific function to remo ...

  2. 异常值处理outlier

    python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...

  3. 深度学习图像配准 Image Registration: From SIFT to Deep Learning

    Image Registration is a fundamental step in Computer Vision. In this article, we present OpenCV feat ...

  4. (转)利用libcurl和国内著名的两个物联网云端通讯的例程, ubuntu和openwrt下调试成功(四)

    1. libcurl 的参考文档如下 CURLOPT_HEADERFUNCTION Pass a pointer to a function that matches the following pr ...

  5. 如何查找物理cpu,cpu核心和逻辑cpu的数量

    环境 Red Hat Enterprise Linux 4 Red Hat Enterprise Linux 5 Red Hat Enterprise Linux 6 Red Hat Enterpri ...

  6. Gartner 2018 年WAF魔力象限报告:云WAF持续增长,Bot管理与API安全拥有未来

    Gartner 2018 年WAF魔力象限报告:云WAF持续增长,Bot管理与API安全拥有未来 来源 https://www.freebuf.com/articles/paper/184903.ht ...

  7. A Gentle Guide to Machine Learning

    A Gentle Guide to Machine Learning Machine Learning is a subfield within Artificial Intelligence tha ...

  8. [linux]linuxI/O测试的方法之dd

    参考http://www.thomas-krenn.com/en/wiki/Linux_I/O_Performance_Tests_using_dd Measuring Write Performan ...

  9. WGCNA构建基因共表达网络详细教程

    这篇文章更多的是对于混乱的中文资源的梳理,并补充了一些没有提到的重要参数,希望大家不会踩坑. 1. 简介 1.1 背景 WGCNA(weighted gene co-expression networ ...

随机推荐

  1. A Deep Learning-Based System for Vulnerability Detection(二)

    接着上一篇,这篇研究实验和结果. A.用于评估漏洞检测系统的指标 TP:为正确检测到漏洞的样本数量 FP:为检测到虚假漏洞样本的数量(误报) FN:为未检真实漏洞的样本数量(漏报) TN:未检测到漏洞 ...

  2. Spring类型转换(Converter)

    Spring的类型转换 以前在面试中就有被问到关于spring数据绑定方面的问题,当时对它一直只是朦朦胧胧的概念,最近稍微闲下来有时间看了一下其中数据转换相关的内容,把相应的内容做个记录. 下面先说明 ...

  3. linux -- 添加、修改、删除路由

    在日常的使用中,或者在服务器中,有两个网卡配置两个地址,访问不同的网络段,这种情况是非常常见的现象,但是,我们需要额外的添加路由表来决定发送的数据包经过正确的网关和interface才能正确的进行通信 ...

  4. SDOI 2019 R1游记

    $SDOI$ $2019$ $R1$游记 昨天才刚回来,今天就来写游记啦! Day -5: 做了一下去年省选的Day1,感觉很神仙. Day -4: 做了一下去年省选的Day2,感觉还是很神仙. Da ...

  5. (六)List All Indices

    Now let’s take a peek at our indices: 现在让我们来看看我们的指数: GET /_cat/indices?v And the response: health st ...

  6. MongoDB install

    下载地址1:https://www.mongodb.org/dl/linux下载地址2:https://www.mongodb.com/download-center/community关于Mongo ...

  7. 横线和文字一排,文字居中显示vertical-align: middle;

    <!DOCTYPE html><html> <head> <meta charset="UTF-8"> <title>& ...

  8. Top Page

    Top Page 由于个人的博客中涉及了几个不同的领域.今后准备设置Index页进行一番整理 : 所有其他页面都可以从这个页面遍历 Top Page

  9. ASP.NET Core使用HttpClient的同步和异步请求

    using System; using System.Collections.Generic; using System.Collections.Specialized; using System.I ...

  10. 追逐心目中的那个Ta

    申明:全篇皆为作者臆想,浪漫主义代表派作品,若有雷同,纯属巧合 人生最难过的不就是在一无所有的年纪里遇到了最想呵护一生的人,而在拥有一切的时候却失去了不顾一切的心. 长夜漫漫,本是相思人,偏听多情曲, ...