What are the advantages of logistic regression over decision trees?FAQ
What are the advantages of logistic regression over decision trees?FAQ
The answer to "Should I ever use learning algorithm (a) over learning algorithm (b)" will pretty much always be yes. Different learning algorithms make different assumptions about the data and have different rates of convergence. The one which works best, i.e. minimizes some cost function of interest (cross validation for example) will be the one that makes assumptions that are consistent with the data and has sufficiently converged to its error rate.
Put in the context of decision trees vs. logistic regression, what are the assumptions made?
Decision trees assume that our decision boundaries are parallel to the axes, for example if we have two features (x1, x2) then it can only create rules such as x1>=4.5, x2>=6.5 etc. which we can visualize as lines parallel to the axis. We see this in practice in the diagram below.
So decision trees chop up the feature space into rectangles (or in higher dimensions, hyper-rectangles). There can be many partitions made and so decision trees naturally scale up to creating more complex (say, higher VC) functions - which can be a problem with over-fitting.
What assumptions does logistic regression make? Despite the probabilistic framework of logistic regression, all that logistic regression assumes is that there is one smooth linear decision boundary. It finds that linear decision boundary by making assumptions that the P(Y|X) of some form, like the inverse logit function applied to a weighted sum of our features. Then it finds the weights by a maximum likelihood approach.
However people get too caught up on that... The decision boundary it creates is a linear* decision boundary that can be of any direction. So if you have data where the decision boundary is not parallel to the axes,
then logistic regression picks it out pretty well, whereas a decision tree will have problems.
So in conclusion,
- Both algorithms are really fast. There isn't much to distinguish them in terms of run-time.
- Logistic regression will work better if there's a single decision boundary, not necessarily parallel to the axis.
- Decision trees can be applied to situations where there's not just one underlying decision boundary, but many, and will work best if the class labels roughly lie in hyper-rectangular regions.
- Logistic regression is intrinsically simple, it has low variance and so is less prone to over-fitting. Decision trees can be scaled up to be very complex, are are more liable to over-fit. Pruning is applied to avoid this.
Maybe you'll be left thinking, "I wish decision trees didn't have to create rules that are parallel to the axis." This motivates support vector machines.
Footnotes:
* linear in your covariates. If you include non-linear transformations or interactions then it will be non-linear in the space of those original covariates.
What are the advantages of logistic regression over decision trees?FAQ的更多相关文章
- Logistic Regression vs Decision Trees vs SVM: Part II
This is the 2nd part of the series. Read the first part here: Logistic Regression Vs Decision Trees ...
- Logistic Regression Vs Decision Trees Vs SVM: Part I
Classification is one of the major problems that we solve while working on standard business problem ...
- Stanford机器学习笔记-2.Logistic Regression
Content: 2 Logistic Regression. 2.1 Classification. 2.2 Hypothesis representation. 2.2.1 Interpretin ...
- [Scikit-learn] 1.1 Generalized Linear Models - Logistic regression & Softmax
二分类:Logistic regression 多分类:Softmax分类函数 对于损失函数,我们求其最小值, 对于似然函数,我们求其最大值. Logistic是loss function,即: 在逻 ...
- Logistic Regression and Gradient Descent
Logistic Regression and Gradient Descent Logistic regression is an excellent tool to know for classi ...
- Logistic Regression 用于预测马是否生病
1.利用Logistic regression 进行分类的主要思想 根据现有数据对分类边界线建立回归公式,即寻找最佳拟合参数集,然后进行分类. 2.利用梯度下降找出最佳拟合参数 3.代码实现 # -* ...
- 逻辑回归 Logistic Regression
逻辑回归(Logistic Regression)是广义线性回归的一种.逻辑回归是用来做分类任务的常用算法.分类任务的目标是找一个函数,把观测值匹配到相关的类和标签上.比如一个人有没有病,又因为噪声的 ...
- logistic regression与SVM
Logistic模型和SVM都是用于二分类,现在大概说一下两者的区别 ① 寻找最优超平面的方法不同 形象点说,Logistic模型找的那个超平面,是尽量让所有点都远离它,而SVM寻找的那个超平面,是只 ...
- Logistic Regression - Formula Deduction
Sigmoid Function \[ \sigma(z)=\frac{1}{1+e^{(-z)}} \] feature: axial symmetry: \[ \sigma(z)+ \sigma( ...
随机推荐
- 如何查看Windows8.1计算机体验指数评分
如果你已经安装使用了Windows 8.1,你就会发现自从Vista时代开始的计算机体验评分消失了,在文章<微软取消Windows 8 计算机评分功能>中,我猜测了微软取消评分功能的可能原 ...
- TSQL基础(一) - 查询
select 1.查询一张表(orders)的所以纪录 select * from Orders 2.查询一张表(orders)某字段的所有记录 select OrderID,OrderDate fr ...
- 打造属于自己的安卓Metro界面
前言: 各位小伙伴,又到了每周更新文章了时候了,本来是周日能发出来呢,这不是赶上清明节吗,女王大人发话了,清明节前两天半陪她玩,只留给我周一下午半天时间写博客,哪里有女王哪里就有压迫呀有木有!好了闲话 ...
- 使用Playground编写第一个Swift程序
从控制台输出“HelloWorld”是我学习C语言的第一步,也是我人生中非常重要的一步.多年后的今天,我仍希望以HelloWorld作为第一步,与大家共同开启一个神奇.瑰丽的世界——Swift编程. ...
- Cocos2d-x开发实例:使用Lambda 表达式
在Cocos2d-x 3.0之后提供了对C++11标准[1]的支持,其中的Lambda[2]表达式使用起来非常简洁.我们可以使用Lambda表达式重构上一节的实例. 我们可以将下面的代码: liste ...
- 模板:Set类
头文件: #include <set> 定义: Set<string> set1; 添加: set1.insert("the"); 查询/获取元素 set1 ...
- 分享一个难得的YiBo微博客户端应用源码Android版
今天给大家分享一款,YiBo微博客户端应用源码,这是一款专为Android用户打造的聚合型微博客户端,完美支持新浪微博.腾讯微博.搜狐微博.网易微博和饭否五个微博平台,界面清爽,使用简单轻巧,支持多账 ...
- 隐藏index.php - ThinkPHP完全开发手册 - 3.1
为了更好的实现SEO优化,我们需要隐藏URL地址中的index.php,由于不同的服务器环境配置方法区别较大,apache环境下面的配置我们可以参考5.9 URL重写来实现,就不再多说了,这里大概 ...
- C#委托的异步调用1
本文将主要通过“同步调用”.“异步调用”.“异步回调”三个示例来讲解在用委托执行同一个“加法类”的时候的的区别和利弊. 首先,通过代码定义一个委托和下面三个示例将要调用的方法: /*添加的命名空间 u ...
- 【Winform】 无法将类型为“System.Windows.Forms.SplitContainer”的对象强制转换为类型“System.ComponentModel.ISupportInitialize”。
问题:将dotnet framework 4.0 切换到2.0时,编译没有问题,在运行时出现如下错误:System.InvalidCastException: 无法将类型为“System.Window ...