Generative Learning algorithms

"generative algorithm models how the data was generated in order to categorize a signal. It asks the question: based on my generation assumptions, which category is most likely to generate this signal?discriminative algorithm does not care about how the data was generated, it simply categorizes a given signal."

discriminative：

试图找到class之间的差异，进而找到decision boundary,最大可能性地区分数据。他是通过直接学习到$p(y|x)$(例如Logistic regress)或者$X \rightarrow Y\in (0,1,...,k)$(例如perceptron algrithm)

generative：

采取另外一种方式，首先由先验知识prori-knowledge得到 $p(x|y),p(y)$ 然后，通过Bayes rule：$p(y|x) = \frac{p(x|y)p(y)}{p(x)} $来求得$p(y|x)$,其中$p(x)=p(x|y=1)p(y=1)+p(x|y=0)p(y=0)$。这个过程可以看做由先验分布去derive后验分布。当然，在只需要判断出可能性大小的情况下，分母无需考虑，即：$$\arg\max_yp(y|x) = \arg \max_y\frac{p(x|y)p(y)}{p(x)}\\=\arg\max_yp(x|y)p(y)$$

先验知识获取$p(x|y)和p(y)$的方式，是通过现有训练数据样本获得参数的过程。
1. 首先假设一个模型，即样本分布的模型（是伯努利还是高斯分布）
2. 然后通过似然估计likelihood function估计出参数
3. 最后通过贝叶斯公式导出$p(y|x)$

example

数据集:$X=(x_1,x_2)$,$Y\in{0,1}$

首先我们假设数据的条件分布$p(x|y)$服从多元高斯正态分布（multivariate normal distribution）,则model形式如下：$$y\sim \textrm{Bernoulli}(\phi) \\ x|y=0 \sim \mathcal{N}(\mu_0,\Sigma) \\ x|y = 1\sim \mathcal{N}(\mu_1,\Sigma )$$
接着通过最大似然估计（max likelihood estimate）估计参数。首先写出log似然函数：$$\ell(\phi,\mu_0,\mu_1,\Sigma) = log\prod_{i=1}^{m}p(x^{(i)},y^{(i)},\mu_0,\mu_1,\Sigma) \\ =log\prod_{i=1}^mp(x^{(i)}|y^{(i)};\mu_0,\mu_1,\Sigma)p(y^{(i)};\phi).$$
然后似然函数$\ell$最大化,即求解似然函数对参数导数为零的点：$$\phi=\frac{1}{m}\sum_{i=1}^{m}1\{y^{(i)}=1\} \\ \mu_0= \frac{\sum_{i=1}^{m}1\{y^{(i)}=0\}x^{(i)}} {\sum_{i=1}^m1\{y^{(i)}=0\}} \\ \mu_1= \frac{\sum_{i=1}^{m}1\{y^{(i)}=1\}x^{(i)}} {\sum_{i=1}^m1\{y^{(i)}=1\}} \\ \Sigma = \frac{1}{m}\sum_{i=1}^m(x^{(i)}-\mu_{y^{(i)}})(x^{(i)}-\mu_{y^{(i)}})^T$$得到参数的估计值$(\phi,\mu_0,\mu_1,\Sigma)$，亦即得到分布函数$p(x|y)$。对照上面的图，$\mu_0,\mu_1$是两个二维向量，在图中的位置是两个正态分布各自的中心点，$\Sigma$则决定者多元正态分布的形状。
![此处输入图片的描述][2]
从这一步可以看出获取参数的方式是“学习”得到的，即从大量样本-先验知识去估计模型，这样想是很自然的逻辑.然而严格的依据却是大数定律law of large numbers (LLN)，大数定律的证明很精彩，可自行查找资料。
通过贝叶斯公式比较$p(y=1|x)$和$p(y=0|x)$,来判别类属性。

Generative Learning algorithms的更多相关文章

Andrew Ng机器学习公开课笔记 -- Generative Learning algorithms
网易公开课,第5课 notes,http://cs229.stanford.edu/notes/cs229-notes2.pdf 学习算法有两种,一种是前面一直看到的,直接对p(y|x; θ)进行建模 ...
生成学习算法(Generative Learning algorithms)
一.引言前面我们谈论到的算法都是在给定$x$的情况下直接对$p(y|x;\theta)$进行建模.例如,逻辑回归利用$h_\theta(x)=g(\theta^T x)$对\(p(y|x ...
Machine Learning Algorithms Study Notes(2)--Supervised Learning
Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...
Machine Learning Algorithms Study Notes(1)--Introduction
Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 目录 1 Introduction 1 1.1 ...
Machine Learning Algorithms Study Notes(3)--Learning Theory
Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...
机器学习算法之旅A Tour of Machine Learning Algorithms
In this post we take a tour of the most popular machine learning algorithms. It is useful to tour th ...
5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics
5 Techniques To Understand Machine Learning Algorithms Without the Background in Mathematics Where d ...
机器学习 Generative Learning Algorithm (B)
Naive Bayes 在GDA模型中,特征向量x是连续的实数向量,在这一讲里,我们将要讨论另外一种算法用来处理特征向量x是离散值的情况. 我们先考虑一个例子,用机器学习的方法建立一个垃圾邮件过滤器, ...
Introduction to Deep Learning Algorithms
Introduction to Deep Learning Algorithms See the following article for a recent survey of deep learn ...

随机推荐

如何让tableView展示数据
设置数据源对象 self.tableView.dataSource = self; 数据源对象要遵守协议 @interface ViewController () <UITableViewDat ...
CODEVS1047 邮票面值设计
题目描述 Description 给定一个信封,最多只允许粘贴N张邮票,计算在给定K(N+K≤40)种邮票的情况下(假定所有的邮票数量都足够),如何设计邮票的面值,能得到最大值MAX,使在1-MAX之 ...
mysql server advanced 5.6基于oracle linux 6.6的安装
mysql 安装有两种,rpm安装和源码包安装,两种包都可以从www.mysql.com官网下载,这次我测试下rpm安装方式. 1.安装环境以及mysql版本: 1.1vcenter 虚拟机环境 1. ...
MySql按指定天数进行分组数据统计分析 1
这几天,在做数据统计,在对数据库数据进行统计过程中,有个需求就是要按照指定天数进行分组, 之前一直没有找到好的方法,就先取出数据,在程序中进行分组. 后发现,可以在SQL语句中实现按天数分组. 例: ...
利用CSS3选择器定制checkbox和radio
之前在一个项目中需要定制checkbox,于是乎用图片模拟了一下,之后发现个更好用的方法(*因为兼容问题在移动开发中用用就好) 效果: 1 2 3 4 5 6 7 实现代码: <style t ...
div中嵌套div速度将会同样很慢
---恢复内容开始--- div中嵌套了div速度将会同样很慢最近很多老板在我们公司做企业站的时候都会要求说:我要div+css的,不要表格建的那种,那样不利于优化.但我们发现就算给他们用div ...
TableViewController的添加,删除,移动
#import "RootTableViewController.h" @interface RootTableViewController () { UITableViewCel ...
重新关联bat文件的打开方式为系统默认方式
为什么“BAT”扩展名的默认打开方式:显示出来的居然是“%1”这么一个怪异的东东,具体在什么位置的? c:\windowssystem32\command.com修复bat关联,打开command.c ...
JDK Debug
http://ishare.iask.sina.com.cn/f/23897007.html http://hi.baidu.com/bd_hare/item/7edd0415b60f0101e65c ...
基本的Logstash 例子
基本的Logstash 例子: 为了测试你的Logstash 安装,运行最基本的Logstash 管道: cd logstash-2.3.0 bin/logstash -e 'input { stdi ...

Generative Learning algorithms

Generative Learning algorithms的更多相关文章

随机推荐

热门专题