Week 1 Machine Learning with Big Data

KNime - GUI based

Spark MLlib - inside Spark

CRISP-DM

  

  

  

Week 2, Data Exploration

一般有两种方法,summary statistics 和 visualization

  

Summary statistics (mean  平均数,median 中位数, mode 最常见的数)

  

  

  

high Kurtosis 预示着有outlier的存在

  

visualization

  

这里详细讲一下 box plot

下图的 upper quartile 和 lower quartile 分别指的是 75% 和 25% 的点, median 很明显是中位数点,中间柱状部分的数据占了总数据的50%. Upper extreme 和 Lower extreme 分别是90% 和 10% 数据的点,超出部分就是outliers.

  

Data preparing

  

  

data wrangling 主要是transformation   

Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)的更多相关文章

  1. Coursera, Big Data 4, Machine Learning With Big Data (week 3/4/5)

    week 3 Classification KNN :基本思想是 input value 类似,就可能是同一类的 Decision Tree Naive Bayes Week 4 Evaluating ...

  2. In machine learning, is more data always better than better algorithms?

    In machine learning, is more data always better than better algorithms? No. There are times when mor ...

  3. [Javascript] Classify JSON text data with machine learning in Natural

    In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classi ...

  4. Coursera 学习笔记|Machine Learning by Standford University - 吴恩达

    / 20220404 Week 1 - 2 / Chapter 1 - Introduction 1.1 Definition Arthur Samuel The field of study tha ...

  5. [Machine Learning with Python] Data Preparation through Transformation Pipeline

    In the former article "Data Preparation by Pandas and Scikit-Learn", we discussed about a ...

  6. [Machine Learning with Python] Data Preparation by Pandas and Scikit-Learn

    In this article, we dicuss some main steps in data preparation. Drop Labels Firstly, we drop labels ...

  7. 斯坦福大学公开课机器学习:machine learning system design | data for machine learning(数据量很大时,学习算法表现比较好的原理)

    下图为四种不同算法应用在不同大小数据量时的表现,可以看出,随着数据量的增大,算法的表现趋于接近.即不管多么糟糕的算法,数据量非常大的时候,算法表现也可以很好. 数据量很大时,学习算法表现比较好的原理: ...

  8. [Machine Learning with Python] Data Visualization by Matplotlib Library

    Before you can plot anything, you need to specify which backend Matplotlib should use. The simplest ...

  9. Coursera《machine learning》--(14)数据降维

    本笔记为Coursera在线课程<Machine Learning>中的数据降维章节的笔记. 十四.降维 (Dimensionality Reduction) 14.1 动机一:数据压缩 ...

随机推荐

  1. extjs 中比较常见且好用的监听事件

    ComboBox listeners:{ expand:function(){ //此函数是,点击下拉框展开的时候事件 }, select:function(com, record, index){ ...

  2. .net prams关键字

    先举个例子: 代码如下: class Program { static void Main(string[] args) { Console.WriteLine(Sum(1)); Console.Wr ...

  3. golang学习和使用经验总结

    学习网址 https://studygolang.com/pkgdoc go标准库网站 https://blog.csdn.net/sanxiaxugang/article/details/60324 ...

  4. ansible roles

    roles 特点 目录结构清晰 重复调用相同的任务 目录结构相同 web - tasks - install.yml - copfile.yml - start.yml -  main.yml - t ...

  5. google 跨域解决办法

    --args --disable-web-security --user-data-dir

  6. filebeat-6.4.3-windows-x86_64输出Kafka

    配置filebeat.yml文件 filebeat.prospectors: - type: log encoding: utf- enabled: true paths: - e:\log.log ...

  7. Nginx CONTENT阶段 concat模块

    L67 concat_delimiter : 根据js 指定 分隔符 比如 “|” 那么每个文件分隔符为 “|” concat_types : 指定要合并文件的类型 concat_unique : s ...

  8. 【Spring】Spring随笔索引

    Spring随笔索引 [Spring]Spring bean的实例化 [Spring]手写Spring MVC [Spring]Spring Data JPA

  9. Bellman-Ford&&SPFA

    我们前文说过,有关最短路径除了Floyed算法之外,还有许多更加好的方法.这里讲一下有关 Bellman-Ford和SPFA的知识 Bellman-Ford:复杂度O(VE) 有关Bellman-Fo ...

  10. Python爬虫之三

    1)使用Scrapy,什么叫做Scrapy Scrapy,Python开发的一个快速.高层次的屏幕抓取和web抓取框架,用于抓取web站点并从页面中提取结构化的数据.Scrapy用途广泛,可以用于数据 ...