I find myself coming back to the same few pictures when explaining basic machine learning concepts. Below is a list I find most illuminating.

1. Test and training error: Why lower training error is not always a good thing: ESL Figure 2.11. Test and training error as a function of model complexity.

2. Under and overfitting: PRML Figure 1.4. Plots of polynomials having various orders M, shown as red curves, fitted to the data set generated by the green curve.

3. Occam's razor: ITILA Figure 28.3. Why Bayesian inference embodies Occam’s razor. This figure gives the basic intuition for why complex models can turn out to be less probable. The horizontal axis represents the space of possible data sets D. Bayes’ theorem rewards models in proportion to how much they predicted the data that occurred. These predictions are quantified by a normalized probability distribution on D. This probability of the data given model Hi, P (D | Hi), is called the evidence for Hi. A simple model H1 makes only a limited range of predictions, shown by P(D|H1); a more powerful model H2, that has, for example, more free parameters than H1, is able to predict a greater variety of data sets. This means, however, that H2 does not predict the data sets in region C1 as strongly as H1. Suppose that equal prior probabilities have been assigned to the two models. Then, if the data set falls in region C1, the less powerful model H1 will be the more probable model.

4. Feature combinations: (1) Why collectively relevant features may look individually irrelevant, and also (2) Why linear methods may fail. From Isabelle Guyon's feature extraction slides.

5. Irrelevant features: Why irrelevant features hurt kNN, clustering, and other similarity based methods. The figure on the left shows two classes well separated on the vertical axis. The figure on the right adds an irrelevant horizontal axis which destroys the grouping and makes many points nearest neighbors of the opposite class.

6. Basis functions: How non-linear basis functions turn a low dimensional classification problem without a linear boundary into a high dimensional problem with a linear boundary. From SVM tutorial slides by Andrew Moore: a one dimensional non-linear classification problem with input x is turned into a 2-D problem z=(x, x^2) that is linearly separable.

7. Discriminative vs. Generative: Why discriminative learning may be easier than generative: PRML Figure 1.27. Example of the class-conditional densities for two classes having a single input variable x (left plot) together with the corresponding posterior probabilities (right plot). Note that the left-hand mode of the class-conditional density p(x|C1), shown in blue on the left plot, has no effect on the posterior probabilities. The vertical green line in the right plot shows the decision boundary in x that gives the minimum misclassification rate.

8. Loss functions: Learning algorithms can be viewed as optimizing different loss functions: PRML Figure 7.5. Plot of the ‘hinge’ error function used in support vector machines, shown in blue, along with the error function for logistic regression, rescaled by a factor of 1/ln(2) so that it passes through the point (0, 1), shown in red. Also shown are the misclassification error in black and the squared error in green.

9. Geometry of least squares: ESL Figure 3.2. The N-dimensional geometry of least squares regression with two predictors. The outcome vector y is orthogonally projected onto the hyperplane spanned by the input vectors x1 and x2. The projection yˆ represents the vector of the least squares predictions.

10. Sparsity: Why Lasso (L1 regularization or Laplacian prior) gives sparse solutions (i.e. weight vectors with more zeros): ESL Figure 3.11. Estimation picture for the lasso (left) and ridge regression (right). Shown are contours of the error and constraint functions. The solid blue areas are the constraint regions |β1| + |β2| ≤ t and β12 + β22 ≤ t2, respectively, while the red ellipses are the contours of the least squares error function.

from: http://www.denizyuret.com/2014/02/machine-learning-in-5-pictures.html

用10张图来看机器学习Machine learning in 10 pictures的更多相关文章

  1. 【机器学习Machine Learning】资料大全

    昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...

  2. 机器学习(Machine Learning)&深度学习(Deep Learning)资料

    <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...

  3. 机器学习(Machine Learning)&深入学习(Deep Learning)资料

    <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost 到随机森林. ...

  4. 机器学习(Machine Learning)&深度学习(Deep Learning)资料【转】

    转自:机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一 ...

  5. 机器学习(Machine Learning)&深度学习(Deep Learning)资料汇总 (上)

    转载:http://dataunion.org/8463.html?utm_source=tuicool&utm_medium=referral <Brief History of Ma ...

  6. 机器学习(Machine Learning)&amp;深度学习(Deep Learning)资料

    机器学习(Machine Learning)&深度学习(Deep Learning)资料 機器學習.深度學習方面不錯的資料,轉載. 原作:https://github.com/ty4z2008 ...

  7. 机器学习(Machine Learning)与深度学习(Deep Learning)资料汇总

    <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...

  8. 机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)

    ##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...

  9. 数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)的区别是什么? 数据科学(data science)和商业分析(business analytics)之间有什么关系?

    本来我以为不需要解释这个问题的,到底数据挖掘(data mining),机器学习(machine learning),和人工智能(AI)有什么区别,但是前几天因为有个学弟问我,我想了想发现我竟然也回答 ...

随机推荐

  1. Selenium--testNG下载地址

    TestNG - http://beust.com/eclipse   http://testng.org/doc/eclipse.html http://testng.org/doc/seleniu ...

  2. persistencejs:异步javascript数据库映射库

    persistence.js 是一个异步的 JavaScript 对象数据库映射(ORM)框架.拥有数据库无关的独立抽象层,可轻松支持新的数据库.该软件最开始是为浏览器设计的,利用 HTML5 的 W ...

  3. .Net使用Redis详解之ServiceStack.Redis

    序言 本篇从.Net如何接入Reis开始,直至.Net对Redis的各种操作,为了方便学习与做为文档的查看,我做一遍注释展现,其中会对list的阻塞功能和事务的运用做二个案例,进行记录学习. Redi ...

  4. CentOS 7中Nginx1.9.5编译安装教程systemctl启动

    先安装gcc 等 yum -y install gcc gcc-c++ wget 复制代码 .然后装一些库 yum -y install gcc wget automake autoconf libt ...

  5. 接口自动化多层嵌套的json数据处理

    最近在做接口自动化测试,响应的内容大多数是多层嵌套的json数据,在对响应数据进行校验的时候,可以通过(key1.key2.key3)形式获取嵌套字典值的方法获取响应值,再和预期值比较 def get ...

  6. python升级带来的yum异常:File "/usr/bin/yum", line 30

    问题: $ yum File "/usr/bin/yum", line 30 except KeyboardInterrupt, e: ^ SyntaxError: invalid ...

  7. POJ 1498[二分匹配——最小顶点覆盖]

    题目链接:[http://acm.hdu.edu.cn/showproblem.php?pid=1498] 题意:给出一个大小为n*n(0<n<100)的矩阵,矩阵中放入m种颜色(标号为1 ...

  8. ARGB 8888 内存大小

    韩梦飞沙  韩亚飞  313134555@qq.com  yue31313  han_meng_fei_sha 4字节  argb 各占8个bit

  9. BZOJ4599[JLoi2016&LNoi2016]成绩比较(dp+拉格朗日插值)

    这个题我们首先可以dp,f[i][j]表示前i个科目恰好碾压了j个人的方案数,然后进行转移.我们先不考虑每个人的分数,先只关心和B的相对大小关系.我们设R[i]为第i科比B分数少的人数,则有f[i][ ...

  10. Spring Mvc 前台数据的获取、SpringMvc 表单数据的获取

    首先在web.xml 里面配置一个编码过滤器 <!-- springmvc框架本身没有处理请求编码,我们自己配置一个请求编码过滤器 --> <filter> <filte ...