Everything You Wanted to Know About Machine Learning
所以张小龙才说‘我说的都是错的’。
note by 王犇
- A set of possible models to look thorough
- A way to test whether a model is good
- A clever way to find a really good model with only a few test
is to always measure the performance of your classifier on out-of-sample data.
testing splits. You should even make some predictions on data you imagine yourself, to see what the model does in certain situations.
you add more and more input fields, you must also add more and more training data to “fill up” the space created
by the additional inputs if you want to use them accurately.
否则非常可能噪音会让你的模型效果更差。
They Seem
know if an algorithm will model your data well is to try it out.
single input field, or even any single pair of fields, is closely correlated with the objective.
to create fields that make machine learning algorithms work better.
of the project’s time goes into feature engineering, 20% goes towards figuring out what comprises a proper and
comprehensive evaluation of the algorithm, and only 10% goes into algorithm selection and tuning.
两个经纬度和两者间的距离是须要相当复杂的转换工作。
转换后可以和用户是否愿意在同一天在两个城市间开车具有很强的关联性。
good evidence that, in a lot of problems, very simple machine learning techniques can be levered into incredibly
powerful classifiers with the addition of loads of data.
A big reason for this is because, once you’ve defined your input fields, there’s only so much analytic gymnastics you can do. Computer algorithms trying to learn models have only a relatively few tricks they can do efficiently, and many of them are not so very
different. Thus, as we have
said before, performance differences between algorithms are typically not large. Thus, if you want better classifiers, you should spend your time:
- Engineering better features
- Getting your hands on more high-quality data
而且事实上非常多模型的原理也都有相似之处。(想想n多的Learning
2 Rank算法)所以假设你希望达到更好的分类器。你能够优先这么做:
a more powerful model by learning multiple classifiers over different random subsets of the data.
that fit the data equally well, many machine learning algorithms have a way of mathematically preferring the simpler of the two. The folk wisdom here is that a simpler model will perform better on out-of-sample testing data, because it has less parameters
to fit, and thus is less likely to be overfit
One should not take this rule too far. There are many places in machine learning where additional complexity can benefit performance. On top of that, it is not quite accurate to say that model complexity leads to overfitting. More accurate is that the procedure
used to fit all that complexity leads to overfitting if it is not very clever. But there are plenty of cases where the complexity is brought to heel by cleverness in the model fitting process.
Thus, prefer simple models because they are smaller, faster to fit, and more interpretable, but not necessarily because they will lead to better performance; the only way to know that is to evaluate your model on
test data.
也不能过于轻信这个原则。也有非常多地方格外的复杂度会带来额外的收益。
太复杂的模型带来overfitting,这样的说法并不准确。有时额外的复杂度是模型训练中有意而且聪明的选择(复杂的structure也许更好契合了问题,效果和简单模型一样。也许仅仅是数据还不够)。
因此,倾向于简单模型由于他们更小。更好训练。更easy解释,但并不一定由于他们会带来更好的效果。
仅仅有实际測试可以告诉你答案。
8. Representable Does Not Imply Learnable
--可表示不代表可学习
are fond of saying that the function representing an accurate prediction on your datais representable by
the learning algorithm. This means that it is possiblefor
the algorithm to build a good model on your data.
by itself. Building a good model may require much more data than you have, or the good model might simply never be found by the algorithm. Just because there’s a good model out there that the algorithm could find
does not mean that it willfind
it.
If the algorithm can’t find a good model, but you are pretty sure that a good model exists, try engineering features that will make that model a little more obvious to the algorithm.
observational data can only show us that two variables are related, but it cannot
tell us the “why”.
可是书不是成绩好的原因,你不能给那些孩子送书就提升他们的成绩。真正的原因可能是。书籍多的家庭父母的教育程度高,对还自己的教育也相对较好。书不过一个indicators
your models. Just because one thing predicts another doesn’t mean it causes another, and making business (or
public policy) decisions based on some imagined causal relationship should be done with extreme caution.
Big Picture
and like any powerful tool,misuses of it can cause a lot of damage.
Understanding how machine learning works and some of the potential pitfalls can go a long way towards keeping you out of trouble.
Everything You Wanted to Know About Machine Learning的更多相关文章
- 【Machine Learning】KNN算法虹膜图片识别
K-近邻算法虹膜图片识别实战 作者:白宁超 2017年1月3日18:26:33 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本系列文章是作者结 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- 【Machine Learning】机器学习及其基础概念简介
机器学习及其基础概念简介 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本系列文章是作者结 ...
- 【Machine Learning】决策树案例:基于python的商品购买能力预测系统
决策树在商品购买能力预测案例中的算法实现 作者:白宁超 2016年12月24日22:05:42 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本 ...
- 【机器学习Machine Learning】资料大全
昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...
- [Machine Learning] Active Learning
1. 写在前面 在机器学习(Machine learning)领域,监督学习(Supervised learning).非监督学习(Unsupervised learning)以及半监督学习(Semi ...
- [Machine Learning & Algorithm]CAML机器学习系列2:深入浅出ML之Entropy-Based家族
声明:本博客整理自博友@zhouyong计算广告与机器学习-技术共享平台,尊重原创,欢迎感兴趣的博友查看原文. 写在前面 记得在<Pattern Recognition And Machine ...
- machine learning基础与实践系列
由于研究工作的需要,最近在看机器学习的一些基本的算法.选用的书是周志华的西瓜书--(<机器学习>周志华著)和<机器学习实战>,视频的话在看Coursera上Andrew Ng的 ...
- matlab基础教程——根据Andrew Ng的machine learning整理
matlab基础教程--根据Andrew Ng的machine learning整理 基本运算 算数运算 逻辑运算 格式化输出 小数位全局修改 向量和矩阵运算 矩阵操作 申明一个矩阵或向量 快速建立一 ...
- Machine Learning
Recently, I am studying Maching Learning which is our course. My English is not good but this course ...
随机推荐
- WCF技术剖析之十:调用WCF服务的客户端应该如何进行异常处理
原文:WCF技术剖析之十:调用WCF服务的客户端应该如何进行异常处理 在前面一片文章(服务代理不能得到及时关闭会有什么后果?)中,我们谈到及时关闭服务代理(Service Proxy)在一个高并发环境 ...
- Cocos2D-X学习笔记 3 从一个场景切换到还有一个场景
工厂方法一般写法 StartLayer * StartLayer::create() { StartLayer *sl = new StartLayer(); sl->init(); sl-&g ...
- Spring3.0 入门进阶(1):从配置文件装载Bean
Spring 已经盛行多年,目前已经处于3.0阶段,关于Spring的概念介绍性的东西网上已经很多,本系列博客主要是把一些知识点通过代码的方式总结起来,以便查阅. 作为入门,本篇主要介绍Bean的加载 ...
- Linux下搭建 Cocos2d-x-2.1.4 编译环境
[tonyfield 2013.09.04 ] 参考 Linux下搭建 Cocos2d-x-2.1.4 编译环境 导入 HelloCpp 例程 1. Java 入口 HelloCpp.java Hel ...
- WebService 之 WSDL文件 解说
恩,我想说的是,是不是常常有人在开发的时候,特别是和第三方有接口的时候,走的是SOAP协议,然后用户给你一个WSDL文件,说依照上面的进行适配,嘿嘿,这个时候,要是你曾经没有开发过,肯定会傻眼,那假设 ...
- readline-6.3 之arm平台交叉编译
近期须要弄个CLI命令接口程序,初步设想是须要支持历史命令翻阅,tab键命令补全这种一个东西.经查阅相关文档,深耕百度一番!(google近期不太正常) 实在恼火.发现readline果真是个好东西, ...
- Android ListView 单条刷新方法实践及原理解析
对于使用listView配合adapter进行刷新的方法大家都不陌生,先刷新adapter里的数据,然后调用notifydatasetchange通知listView刷新界面. 方法虽然简单,但这里面 ...
- servlet后台怎样接收对象參数
主要思想是用js把对象转换成json.然后把json提交到后台去,后台把这个json字符串转换成map对象 <script type="text/javascript"> ...
- linux进程解析--进程的创建
通常我们在代码中调用fork()来创建一个进程或者调用pthread_create()来创建一个线程,创建一个进程需要为其分配内存资源,文件资源,时间片资源等,在这里来描述一下linux进程的创建过程 ...
- 11661 - Burger Time?
Burger Time? Everybody knows that along the more important highways there are countless fast food ...