Everything You Wanted to Know About Machine Learning
所以张小龙才说‘我说的都是错的’。
note by 王犇
- A set of possible models to look thorough
- A way to test whether a model is good
- A clever way to find a really good model with only a few test
is to always measure the performance of your classifier on out-of-sample data.
testing splits. You should even make some predictions on data you imagine yourself, to see what the model does in certain situations.
you add more and more input fields, you must also add more and more training data to “fill up” the space created
by the additional inputs if you want to use them accurately.
否则非常可能噪音会让你的模型效果更差。
They Seem
know if an algorithm will model your data well is to try it out.
single input field, or even any single pair of fields, is closely correlated with the objective.
to create fields that make machine learning algorithms work better.
of the project’s time goes into feature engineering, 20% goes towards figuring out what comprises a proper and
comprehensive evaluation of the algorithm, and only 10% goes into algorithm selection and tuning.
两个经纬度和两者间的距离是须要相当复杂的转换工作。
转换后可以和用户是否愿意在同一天在两个城市间开车具有很强的关联性。
good evidence that, in a lot of problems, very simple machine learning techniques can be levered into incredibly
powerful classifiers with the addition of loads of data.
A big reason for this is because, once you’ve defined your input fields, there’s only so much analytic gymnastics you can do. Computer algorithms trying to learn models have only a relatively few tricks they can do efficiently, and many of them are not so very
different. Thus, as we have
said before, performance differences between algorithms are typically not large. Thus, if you want better classifiers, you should spend your time:
- Engineering better features
- Getting your hands on more high-quality data
而且事实上非常多模型的原理也都有相似之处。(想想n多的Learning
2 Rank算法)所以假设你希望达到更好的分类器。你能够优先这么做:
a more powerful model by learning multiple classifiers over different random subsets of the data.
that fit the data equally well, many machine learning algorithms have a way of mathematically preferring the simpler of the two. The folk wisdom here is that a simpler model will perform better on out-of-sample testing data, because it has less parameters
to fit, and thus is less likely to be overfit
One should not take this rule too far. There are many places in machine learning where additional complexity can benefit performance. On top of that, it is not quite accurate to say that model complexity leads to overfitting. More accurate is that the procedure
used to fit all that complexity leads to overfitting if it is not very clever. But there are plenty of cases where the complexity is brought to heel by cleverness in the model fitting process.
Thus, prefer simple models because they are smaller, faster to fit, and more interpretable, but not necessarily because they will lead to better performance; the only way to know that is to evaluate your model on
test data.
也不能过于轻信这个原则。也有非常多地方格外的复杂度会带来额外的收益。
太复杂的模型带来overfitting,这样的说法并不准确。有时额外的复杂度是模型训练中有意而且聪明的选择(复杂的structure也许更好契合了问题,效果和简单模型一样。也许仅仅是数据还不够)。
因此,倾向于简单模型由于他们更小。更好训练。更easy解释,但并不一定由于他们会带来更好的效果。
仅仅有实际測试可以告诉你答案。
8. Representable Does Not Imply Learnable
--可表示不代表可学习
are fond of saying that the function representing an accurate prediction on your datais representable by
the learning algorithm. This means that it is possiblefor
the algorithm to build a good model on your data.
by itself. Building a good model may require much more data than you have, or the good model might simply never be found by the algorithm. Just because there’s a good model out there that the algorithm could find
does not mean that it willfind
it.
If the algorithm can’t find a good model, but you are pretty sure that a good model exists, try engineering features that will make that model a little more obvious to the algorithm.
observational data can only show us that two variables are related, but it cannot
tell us the “why”.
可是书不是成绩好的原因,你不能给那些孩子送书就提升他们的成绩。真正的原因可能是。书籍多的家庭父母的教育程度高,对还自己的教育也相对较好。书不过一个indicators
your models. Just because one thing predicts another doesn’t mean it causes another, and making business (or
public policy) decisions based on some imagined causal relationship should be done with extreme caution.
Big Picture
and like any powerful tool,misuses of it can cause a lot of damage.
Understanding how machine learning works and some of the potential pitfalls can go a long way towards keeping you out of trouble.
Everything You Wanted to Know About Machine Learning的更多相关文章
- 【Machine Learning】KNN算法虹膜图片识别
K-近邻算法虹膜图片识别实战 作者:白宁超 2017年1月3日18:26:33 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本系列文章是作者结 ...
- 【Machine Learning】Python开发工具:Anaconda+Sublime
Python开发工具:Anaconda+Sublime 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现 ...
- 【Machine Learning】机器学习及其基础概念简介
机器学习及其基础概念简介 作者:白宁超 2016年12月23日21:24:51 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本系列文章是作者结 ...
- 【Machine Learning】决策树案例:基于python的商品购买能力预测系统
决策树在商品购买能力预测案例中的算法实现 作者:白宁超 2016年12月24日22:05:42 摘要:随着机器学习和深度学习的热潮,各种图书层出不穷.然而多数是基础理论知识介绍,缺乏实现的深入理解.本 ...
- 【机器学习Machine Learning】资料大全
昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...
- [Machine Learning] Active Learning
1. 写在前面 在机器学习(Machine learning)领域,监督学习(Supervised learning).非监督学习(Unsupervised learning)以及半监督学习(Semi ...
- [Machine Learning & Algorithm]CAML机器学习系列2:深入浅出ML之Entropy-Based家族
声明:本博客整理自博友@zhouyong计算广告与机器学习-技术共享平台,尊重原创,欢迎感兴趣的博友查看原文. 写在前面 记得在<Pattern Recognition And Machine ...
- machine learning基础与实践系列
由于研究工作的需要,最近在看机器学习的一些基本的算法.选用的书是周志华的西瓜书--(<机器学习>周志华著)和<机器学习实战>,视频的话在看Coursera上Andrew Ng的 ...
- matlab基础教程——根据Andrew Ng的machine learning整理
matlab基础教程--根据Andrew Ng的machine learning整理 基本运算 算数运算 逻辑运算 格式化输出 小数位全局修改 向量和矩阵运算 矩阵操作 申明一个矩阵或向量 快速建立一 ...
- Machine Learning
Recently, I am studying Maching Learning which is our course. My English is not good but this course ...
随机推荐
- ssh登录的时候,根本不给输入密码的机会,直接拒绝,是因为BatchMode的设置
BatchMode no“BatchMode”如果设为“yes”,passphrase/password(交互式输入口令)的提示将被禁止.当不能交互式输入口令的时候,这个选项对脚本文件和批处理任务十分 ...
- 设置MAVEN_OPTS环境变量
运行mvn命令实际上是执行了Java命令,既然是运行Java,那么运行Java命令可用的参数当然也应该在运行mvn命令时可用.这个时候,MAVEN_OPTS环境变量就能派上用场. 通常需要设置MAVE ...
- 环保创业的可行之道——Leo鉴书上66
近2年,我一直在关注不同企业的发展历程,国内的国外的.看他们成功其中的共性与特性.<蚯蚓创业记>无疑给我开了扇窗--环保企业的怎样发展与壮大.读者还能从书里读出普通年轻人坚持自己梦想最终得 ...
- C语言,realloc
void * realloc ( void * ptr, size_t new_size ); 关于realloc的行为方式,结合源码总结为:1. realloc失败的时候,返回NULL: 2. re ...
- Silverlight技术调查(1)——Html向Silverlight传参
原文 Silverlight技术调查(1)——Html向Silverlight传参 近几日项目研究一个很牛的富文档编辑器DXperience RichEdit组件,调查环境为Silverlight4. ...
- jQuery ajax表单提交实现局部刷新
jQuery Ajax 异步提交 Form 表单,如果使用 get 请求,注意中文乱码问题,jquery 会先使用 iso8859-1 解码,然后发给服务器,如果使用 post 请求,则直接将中文内容 ...
- js 常用方法记事本
1.获取被选中行的名称<tab选项卡中为iframe> /* S 获取首页被选中的选项卡名称 */ var currTab = $("#layout_center_tabs&qu ...
- linux-mint下搭建android,angularjs,rails,html5开发环境 - qijie29896的个人空间 - 开源中国社区
linux-mint下搭建android,angularjs,rails,html5开发环境 - qijie29896的个人空间 - 开源中国社区 http://blog.csdn.net/orzor ...
- ufldl学习笔记和编程作业:Feature Extraction Using Convolution,Pooling(卷积和汇集特征提取)
ufldl学习笔记与编程作业:Feature Extraction Using Convolution,Pooling(卷积和池化抽取特征) ufldl出了新教程,感觉比之前的好,从基础讲起.系统清晰 ...
- MapReduce整体架构分析
继前段时间分析Redis源代码一段时间之后.我即将開始接下来的一段技术学习的征程.研究的技术就是当前很火热的Hadoop,可是一个Hadoop生态圈是很庞大的.所以首先我的打算是挑选当中的一部分模块, ...