1.Problem and Loss Function

 
Linear Regression is a Supervised Learning Algorithm with input matrix X and output label Y. We train a system to make hypothesis, which we hope to be as close to Y as possible. The system we build for Linear Regression is :
 
hθ(X)=θTX

From the initial state, we probably have a really poor system (may be only output zero). By using X and Y to train, we try to derive a better parameter θ. The training process (learning process) may be time-consuming, because the algorithm updates parameters only a little on every training step.

2. Cost Function?

Suppose driving from somewhere to Toronto: it is easy to know the coordinates of Toronto, but it is more important to know where we are now! Cost function is the tool giving us how different between  Hypothesis and label Y, so that we can drive to the target. For regression problem, we use MSE as the cost function.

 
This can be understood from another perspective. Suppose the difference between Y and H is ε, and ε~N(0,σ2). So, y~N(θTX,σ2). Then we do Maximum Likelihood Estimate, we can also get the same cost function. (https://stats.stackexchange.com/questions/253345/relationship-between-mle-and-least-squares-in-case-of-linear-regression)
 
 
3.Gradient Descent
 
The process of GD is quite like go downhill along the steepest direction on every dimension.
 
 
We take derivatives along every dimension
Then update all θ by a small learning rate alpha simultaneously:
 
4. Batch Learning, Stochastic and Mini Batch
 
In above, we use all the training examples together to calculate cost function and gradient. This method is called 'Batch Gradient Descent'. The issue here is: what if there is a exetremely large data set? The training process can be quitely long. A variant is called Stochastic Gradient Descent, also 'Online Learning'. Every time when it trains, the algorithm only uses a single training example, which may result in very zigzagged learning curve. Finally, the most popurlar version:' Mini-Batch Gradient Descent'. It chooses a small group of training example to learn, so the speed is OK, and the learning curve is more smooth.

Linear Regression and Gradient Descent (English version)的更多相关文章

  1. Linear Regression Using Gradient Descent 代码实现

    参考吴恩达<机器学习>, 进行 Octave, Python(Numpy), C++(Eigen) 的原理实现, 同时用 scikit-learn, TensorFlow, dlib 进行 ...

  2. 线性回归、梯度下降(Linear Regression、Gradient Descent)

    转载请注明出自BYRans博客:http://www.cnblogs.com/BYRans/ 实例 首先举个例子,假设我们有一个二手房交易记录的数据集,已知房屋面积.卧室数量和房屋的交易价格,如下表: ...

  3. 斯坦福机器学习视频笔记 Week1 Linear Regression and Gradient Descent

    最近开始学习Coursera上的斯坦福机器学习视频,我是刚刚接触机器学习,对此比较感兴趣:准备将我的学习笔记写下来, 作为我每天学习的签到吧,也希望和各位朋友交流学习. 这一系列的博客,我会不定期的更 ...

  4. 斯坦福机器学习视频笔记 Week1 线性回归和梯度下降 Linear Regression and Gradient Descent

    最近开始学习Coursera上的斯坦福机器学习视频,我是刚刚接触机器学习,对此比较感兴趣:准备将我的学习笔记写下来, 作为我每天学习的签到吧,也希望和各位朋友交流学习. 这一系列的博客,我会不定期的更 ...

  5. Linear Regression and Gradient Descent

    随着所学算法的增多,加之使用次数的增多,不时对之前所学的算法有新的理解.这篇博文是在2018年4月17日再次编辑,将之前的3篇博文合并为一篇. 1.Problem and Loss Function ...

  6. Logistic Regression and Gradient Descent

    Logistic Regression and Gradient Descent Logistic regression is an excellent tool to know for classi ...

  7. Logistic Regression Using Gradient Descent -- Binary Classification 代码实现

    1. 原理 Cost function Theta 2. Python # -*- coding:utf8 -*- import numpy as np import matplotlib.pyplo ...

  8. flink 批量梯度下降算法线性回归参数求解(Linear Regression with BGD(batch gradient descent) )

    1.线性回归 假设线性函数如下: 假设我们有10个样本x1,y1),(x2,y2).....(x10,y10),求解目标就是根据多个样本求解theta0和theta1的最优值. 什么样的θ最好的呢?最 ...

  9. machine learning (7)---normal equation相对于gradient descent而言求解linear regression问题的另一种方式

    Normal equation: 一种用来linear regression问题的求解Θ的方法,另一种可以是gradient descent 仅适用于linear regression问题的求解,对其 ...

随机推荐

  1. 2019Android阿里&腾讯&百度&字节面试汇总(附面试题总结、Android书单)

    1.基本情况 先简单说说我今年的面试经历吧,本人2018届211软件工程硕士生,Android开发岗.此文主要是2019年年初春招的面试和秋招面试经验汇总,最终拿到了阿里,腾讯,字节跳动,百度等off ...

  2. 那些年很脑残的bugs

    1.老师给了前端界面,我们用java写后台. 我改了表单form的action属性,让它跳到自己写的servlet上面去.自己在servlet里面对数据库一顿操作猛如虎,然后让servlet跳回原来页 ...

  3. C#设计模式:装饰者模式(Decorator Pattern)

    一,装饰者模式(Decorator Pattern):装饰模式指的是在不必改变原类文件和使用继承的情况下,动态地扩展一个对象的功能. 二,在以上代码中我们是中国人是根本行为,我们给中国人装饰我会说英语 ...

  4. JVM(7)之 从GC日志分析堆内存

    开发十年,就只剩下这套架构体系了! >>>   在前面的文章中,我们只设置了整个堆的内存大小.但是我们知道,堆又分为了新生代,年老代.他们之间的内存怎么分配呢?新生代又分为Eden和 ...

  5. ssh-agent - 认证代理

    总览 (SYNOPSIS) ssh-agent [-a bind_address ] [-c | -s ] [-t life ] [-d ] [command [args ... ] ] ssh-ag ...

  6. 北京太速科技有限公司 layout 事业部

      涵盖领域设计能力 ·通信板卡:PCI/PCIE/CPCI/VPX/光通信/无线通信/射频雷达/显卡                     1.最小线宽:2MIL·主板服务器:电脑主板/交换机/服 ...

  7. JS window对象 返回前一个浏览的页面 back()方法,加载 history 列表中的前一个 URL。 语法: window.history.back();

    返回前一个浏览的页面 back()方法,加载 history 列表中的前一个 URL. 语法: window.history.back(); 比如,返回前一个浏览的页面,代码如下: window.hi ...

  8. linux 性能测试之基准测试工具

    https://niyunjiu.iteye.com/blog/316302 system: lmbench unixbench5.1.2 ubench freebench nbench ltp xf ...

  9. easyapi

    create database easyrec; #为easyrec初始化用户名跟密码grant index, create, select, insert, update, drop, delete ...

  10. bzoj 1003物流运输 区间dp+spfa

    基本思路: 一开始确实没什么思路,因为觉得怎么着都会超时,然后看一下数据范围,呵,怎么都不会超时. 思路: 1.看到能改变线路,想到可以用以下区间dp,区间dp的话,先枚举长度,枚举开始位置,然后枚举 ...