(原创)Stanford Machine Learning (by Andrew NG) --- (week 1) Linear Regression
Andrew NG的Machine learning课程地址为:https://www.coursera.org/course/ml
在Linear Regression部分出现了一些新的名词,这些名词在后续课程中会频繁出现:
| Cost Function | Linear Regression | Gradient Descent | Normal Equation | Feature Scaling | Mean normalization |
| 损失函数 | 线性回归 | 梯度下降 | 正规方程 | 特征归一化 | 均值标准化 |
Model Representation
- m: number of training examples
- x(i): input (features) of ith training example
- xj(i): value of feature j in ith training example
- y(i): “output” variable / “target” variable of ith training example
- n: number of features
- θ: parameters
- Hypothesis: hθ(x) = θ0 + θ1x1 + θ2x2 + … +θnxn
Cost Function
IDEA: Choose θso that hθ(x) is close to y for our training examples (x, y).
A.Linear Regression with One Variable Cost Function
Cost Function: 
Goal: 
Contour Plot:

B.Linear Regression with Multiple Variable Cost Function
Cost Function: 
Goal: 
Gradient Descent
Outline

Gradient Descent Algorithm

迭代过程收敛图可能如下:

(此为等高线图,中间为最小值点,图中蓝色弧线为可能的收敛路径。)
Learning Rate α:
1) If α is too small, gradient descent can be slow to converge;
2) If α is too large, gradient descent may not decrease on every iteration or may not converge;
3) For sufficiently small α , J(θ) should decrease on every iteration;
Choose Learning Rate α: Debug, 0.001, 0.003, 0.006, 0.01, 0.03, 0.06, 0.1, 0.3, 0.6, 1.0;
“Batch” Gradient Descent: Each step of gradient descent uses all the training examples;
“Stochastic” gradient descent: Each step of gradient descent uses only one training examples.
Normal Equation
IDEA: Method to solve for θ analytically.
for every j, then 
Restriction: Normal Equation does not work when (XTX) is non-invertible.
PS: 当矩阵为满秩矩阵时,该矩阵可逆。列向量(feature)线性无关且行向量(样本)线性无关的个数大于列向量的个数(特征个数n).
Gradient Descent Algorithm VS. Normal Equation
Gradient Descent:
- Need to choose α;
- Needs many iterations;
- Works well even when n is large; (n > 1000 is appropriate)
Normal Equation:
- No need to choose α;
- Don’t need to iterate;
- Need to compute (XTX)-1 ;
- Slow if n is very large. (n < 1000 is OK)
Feature Scaling
IDEA: Make sure features are on a similar scale.
好处: 减少迭代次数,有利于快速收敛
Example: If we need to get every feature into approximately a -1 ≤ xi ≤ 1 range, feature values located in [-3, 3] or [-1/3, 1/3] fields are acceptable.
Mean normalization: 
HOMEWORK
好了,既然看完了视频课程,就来做一下作业吧,下面是Linear Regression部分作业的核心代码:
1.computeCost.m/computeCostMulti.m
J=/(*m)*sum((theta'*X'-y').^2);
2.gradientDescent.m/gradientDescentMulti.m
h=X*theta-y;
v=X'*h;
v=v*alpha/m;
theta1=theta;
theta=theta-v;
(原创)Stanford Machine Learning (by Andrew NG) --- (week 1) Linear Regression的更多相关文章
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 3) Logistic Regression & Regularization
coursera上面Andrew NG的Machine learning课程地址为:https://www.coursera.org/course/ml 我曾经使用Logistic Regressio ...
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 10) Large Scale Machine Learning & Application Example
本栏目来源于Andrew NG老师讲解的Machine Learning课程,主要介绍大规模机器学习以及其应用.包括随机梯度下降法.维批量梯度下降法.梯度下降法的收敛.在线学习.map reduce以 ...
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 8) Clustering & Dimensionality Reduction
本周主要介绍了聚类算法和特征降维方法,聚类算法包括K-means的相关概念.优化目标.聚类中心等内容:特征降维包括降维的缘由.算法描述.压缩重建等内容.coursera上面Andrew NG的Mach ...
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 7) Support Vector Machines
本栏目内容来源于Andrew NG老师讲解的SVM部分,包括SVM的优化目标.最大判定边界.核函数.SVM使用方法.多分类问题等,Machine learning课程地址为:https://www.c ...
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 9) Anomaly Detection&Recommender Systems
这部分内容来源于Andrew NG老师讲解的 machine learning课程,包括异常检测算法以及推荐系统设计.异常检测是一个非监督学习算法,用于发现系统中的异常数据.推荐系统在生活中也是随处可 ...
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 4) Neural Networks Representation
Andrew NG的Machine learning课程地址为:https://www.coursera.org/course/ml 神经网络一直被认为是比较难懂的问题,NG将神经网络部分的课程分为了 ...
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 1) Introduction
最近学习了coursera上面Andrew NG的Machine learning课程,课程地址为:https://www.coursera.org/course/ml 在Introduction部分 ...
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 5) Neural Networks Learning
本栏目内容来自Andrew NG老师的公开课:https://class.coursera.org/ml/class/index 一般而言, 人工神经网络与经典计算方法相比并非优越, 只有当常规方法解 ...
- (原创)Stanford Machine Learning (by Andrew NG) --- (week 6) Advice for Applying Machine Learning & Machine Learning System Design
(1) Advice for applying machine learning Deciding what to try next 现在我们已学习了线性回归.逻辑回归.神经网络等机器学习算法,接下来 ...
随机推荐
- 使用BackgroundWorker
1,WPF应用程序为单线程模型(STAThread),所有UI控件都是主线程创建的,只有主线程能操作UI元素的显示. 2,其他工作线程要维护UI控件的显示,需调用主线程的Dispather,执行Inv ...
- parse_str
之前没有遇到过parse_str,其意思就是“把查询字符串解析到变量中”也就是$str会被解析为变量. <?php $data = "a=1&b=2";parse_s ...
- 【EverydaySport】健身笔记——静态牵拉
静态牵拉一般在运动后进行,可以有效的提高肌肉的柔韧性和关节的灵活性,预防和缓解疼痛. 每个动作达到自己活动范围的最大,有牵拉感即说明有效,静态保持至少30秒,切勿震荡,进行2组. 1 大腿前群牵拉 2 ...
- Linux-进程间通信(四): 域套接字
1. 域套接字: (1) 只能用于同一设备上不同进程之间的通信: (2) 效率高于网络套接字.域套接字仅仅是复制数据,并不走协议栈: (3) 可靠,全双工: 2. 域套接字地址结构: struct s ...
- Win10默认图片查看器更改
Win10自带的图片查看器不是很习惯,其背景乌漆嘛黑,宽扁的额头让人想起了黑边火腿肠手机,无法直视.怀念Win7和Win8.1的图片查看器,一个鼠标滚轮缩放自如的酸爽感觉.但却遗憾地发现,并不能直观地 ...
- 切面保存web访问记录
package com.hn.xf.device.api.rest.aspect; import com.hn.xf.device.api.rest.authorization.manager.Tok ...
- [转] Socket心跳包异常检测的C语言实现,服务器与客户端代码案例
转载自:zxh2075的专栏 在Socket心跳机制中,心跳包可以由服务器发送给客户端,也可以由客户端发送给服务器,不过比较起来,前者开销可能较大.本文实现的是由客户端给服务器发送心跳包,服务器不必返 ...
- POJ-2594
Treasure Exploration Time Limit: 6000MS Memory Limit: 65536K Total Submissions: 7035 Accepted: 2 ...
- kafka 设置消费者线程数
http://blog.csdn.net/derekjiang/article/details/9053863 分布式发布订阅消息系统 Kafka 架构设计 - 目前见到的最好的Kafka中文文章 M ...
- hdu 1080(LCS变形)
Human Gene Functions Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Oth ...