【原】Coursera—Andrew Ng机器学习—Week 1 习题—Linear Regression with One Variable 单变量线性回归
Question 1
Consider the problem of predicting how well a student does in her second year of college/university, given how well she did in her first year.
Specifically, let x be equal to the number of “A” grades (including A-. A and A+ grades) that a student receives in their first year of college (freshmen year). We would like to predict the value of y, which we define as the number of “A” grades they get in their second year (sophomore year).
Here each row is one training example. Recall that in linear regression, our hypothesis is hθ(x)=θ0+θ1x, and we use m to denote the number of training examples.
x |
y |
---|---|
5 |
4 |
3 |
4 |
0 |
1 |
4 |
3 |
For the training set given above (note that this training set may also be referenced in other questions in this quiz), what is the value of m? In the box below, please enter your answer (which should be a number between 0 and 10).
Answer:
4
Question 2
Consider the following training set of m=4 training examples:
x |
y |
---|---|
1 |
0.5 |
2 |
1 |
4 |
2 |
0 |
0 |
Consider the linear regression model hθ(x)=θ0+θ1x. What are the values of θ0 and θ1
that you would expect to obtain upon running gradient descent on this
model? (Linear regression will be able to fit this data perfectly.)
θ0=0.5,θ1=0
θ0=0.5,θ1=0.5
θ0=1,θ1=0.5
θ0=0,θ1=0.5
θ0=1,θ1=1
Answer:
θ0=0,θ1=0.5
As J(θ0,θ1)=0, y = hθ(x) = θ0 + θ1x. Using any two values in the table, solve for θ0, θ1.
Question 3
Suppose we set θ0=−1,θ1=0.5. What is hθ(4)?
Answer:
Setting x = 4, we have hθ(x)=θ0+θ1x = -1 + (0.5)(4) = 1
Question 4
Let f be some function so that f(θ0,θ1) outputs a number. For this problem,f is some arbitrary/unknown smooth function (not necessarily the cost function of linear regression, so f may have local optima).Suppose we use gradient descent to try to minimize f(θ0,θ1) as a function of θ0 and θ1. Which of thefollowing statements are true? (Check all that apply.)
Even if the learning rate α is very large, every iteration of gradient descent will decrease the value of f(θ0,θ1).
If the learning rate is too small, then gradient descent may take a very long time to converge.
If θ0 and θ1 are initialized at a local minimum, then one iteration will not change their values.
If θ0 and θ1 are initialized so that θ0=θ1,
then by symmetry (because we do simultaneous updates to the two
parameters), after one iteration of gradient descent, we will still have
θ0=θ1.
Answers:
True or False |
Statement |
Explanation |
---|---|---|
True |
If the learning rate is too small, then gradient descent may take a very long time to converge. |
If the learning rate is small, gradient descent ends up taking an |
True |
If θ0 and θ1 are initialized at a local minimum, then one iteration will not change their values. |
At a local minimum, the derivative (gradient) is zero, so gradient descent will not change the parameters. |
False |
Even if the learning rate α is very large, every iteration of gradient descent will decrease the value of f(θ0,θ1). |
If the learning rate is too large, one step of gradient descent |
False |
If θ0 and θ1 are initialized so that θ0=θ1, |
The updates to θ0 and θ1 are different (even |
Other Options:
True or False |
Statement |
Explanation |
---|---|---|
True |
If the first few iterations of gradient descent cause f(θ0,θ1) to increase rather than decrease, then the most likely cause is that we have set the learning rate to too large a value |
if alpha were small enough, then gradient descent should always successfully take a tiny small downhill and decrease f(θ0,θ1) |
False |
No matter how θ0 and θ1 are initialized, so |
This is not true, depending on the initial condition, gradient descent may end up at different local optima. |
False |
Setting the learning rate to be very small is not harmful, and can only speed up the convergence of gradient descent. |
If the learning rate is small, gradient descent ends up taking an |
Question 5
Suppose that for some linear regression problem (say, predicting
housing prices as in the lecture), we have some training set, and for
our training set we managed to find some θ0, θ1 such that J(θ0,θ1)=0.
Which of the statements below must then be true? (Check all that apply.)
For this to be true, we must have y(i)=0 for every value of i=1,2,…,m.
Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum.
For this to be true, we must have θ0=0 and θ1=0 so that hθ(x)=0
Our training set can be fit perfectly by a straight line, i.e.,
all of our training examples lie perfectly on some straight line.
Answers:
True or False |
Statement |
Explanation |
---|---|---|
False |
For this to be true, we must have y(i)=0 for every value of i=1,2,…,m. |
So long as all of our training examples lie on a straight line, we will be able to find θ0 and θ1) so that J(θ0,θ1)=0. It is not necessary that y(i) for all our examples. |
False |
Gradient descent is likely to get stuck at a local minimum and fail to find the global minimum. |
none |
False |
For this to be true, we must have θ0=0 and θ1=0 so that hθ(x)=0 |
If J(θ0,θ1)=0 that means the line defined by the equation “y = θ0 + θ1x” perfectly fits all of our data. There’s no particular reason to expect that the values of θ0 and θ1 that achieve this are both 0 (unless y(i)=0 for all of our training examples). |
True |
Our training set can be fit perfectly by a straight line, i.e., all of our training examples lie perfectly on some straight line. |
If J(θ0,θ1)=0, that means the line defined by the equation "y=θ0+θ1x" perfectly fits all of our data. |
False |
We can perfectly predict the value of y even for new examples that we have not yet seen. (e.g., we can perfectly predict prices of even new houses that we have not yet seen.) |
None |
False |
This is not possible: By the definition of J(θ0,θ1), it is not possible for there to exist θ0 and θ1 so that J(θ0,θ1)=0 |
None |
True |
For these values of θ0 and θ1 that satisfy J(θ0,θ1)=0, we have that hθ(x(i))=y(i) for every training example (x(i),y(i)) |
Not all the hθ(x(i)) need to be equal to y(i) |
【原】Coursera—Andrew Ng机器学习—Week 1 习题—Linear Regression with One Variable 单变量线性回归的更多相关文章
- 【原】Coursera—Andrew Ng机器学习—课程笔记 Lecture 2_Linear regression with one variable 单变量线性回归
Lecture2 Linear regression with one variable 单变量线性回归 2.1 模型表示 Model Representation 2.1.1 线性回归 Li ...
- 【原】Coursera—Andrew Ng机器学习—Week 2 习题—Linear Regression with Multiple Variables 多变量线性回归
Gradient Descent for Multiple Variables [1]多变量线性模型 代价函数 Answer:AB [2]Feature Scaling 特征缩放 Answer:D ...
- 【原】Coursera—Andrew Ng机器学习—Week 3 习题—Logistic Regression 逻辑回归
课上习题 [1]线性回归 Answer: D A 特征缩放不起作用,B for all 不对,C zero error不对 [2]概率 Answer:A [3]预测图形 Answer:A 5 - x1 ...
- 【原】Coursera—Andrew Ng机器学习—Week 11 习题—Photo OCR
[1]机器学习管道 [2]滑动窗口 Answer:C ((200-20)/4)2 = 2025 [3]人工数据 [4]标记数据 Answer:B (10000-1000)*10 /(8*60*60) ...
- 【原】Coursera—Andrew Ng机器学习—Week 5 习题—Neural Networks learning
课上习题 [1]代价函数 [2]代价函数计算 [3] [4]矩阵的向量化 [5]梯度校验 Answer:(1.013 -0.993) / 0.02 = 3.001 [6]梯度校验 Answer:学习的 ...
- 【原】Coursera—Andrew Ng机器学习—Week 10 习题—大规模机器学习
[1]大规模数据 [2]随机梯度下降 [3]小批量梯度下降 [4]随机梯度下降的收敛 Answer:BD A 错误.学习率太小,算法容易很慢 B 正确.学习率小,效果更好 C 错误.应该是确定阈值吧 ...
- 【原】Coursera—Andrew Ng机器学习—Week 9 习题—异常检测
[1]异常检测 [2]高斯分布 [3]高斯分布 [4] 异常检测 [5]特征选择 [6] [7]多变量高斯分布 Answer: ACD B 错误.需要矩阵Σ可逆,则要求m>n 测验1 Answ ...
- 【原】Coursera—Andrew Ng机器学习—Week 8 习题—聚类 和 降维
[1]无监督算法 [2]聚类 [3]代价函数 [4] [5]K的选择 [6]降维 Answer:本来是 n 维,降维之后变成 k 维(k ≤ n) [7] [8] Answer: 斜率-1 [9] A ...
- 【原】Coursera—Andrew Ng机器学习—Week 7 习题—支持向量机SVM
[1] [2] Answer: B. 即 x1=3这条垂直线. [3] Answer: B 因为要尽可能小.对B,右侧红叉,有1/2 * 2 = 1 ≥ 1,左侧圆圈,有1/2 * -2 = -1 ...
随机推荐
- opencv 边界确定函数
多边形逼近,用嘴贴切的多边形标识 void approxPolyDP(InputArray curve, OutputArray approxCurve, double epsilon, bool c ...
- java网络编程客户端与服务端原理以及用URL解析HTTP协议
常见客户端与服务端 客户端: 浏览器:IE 服务端: 服务器:web服务器(Tomcat),存储服务器,数据库服务器. (注:会用到Tomact服务器,在webapps下有一个自己创建的目录myweb ...
- HDU3507Print Article (斜率优化DP)
Zero has an old printer that doesn't work well sometimes. As it is antique, he still like to use it ...
- web应用程序安全攻防---sql注入和xss跨站脚本攻击
kali视频学习请看 http://www.cnblogs.com/lidong20179210/p/8909569.html 博文主要内容包括两种常见的web攻击 sql注入 XSS跨站脚本攻击 代 ...
- vs中删除nuget包
最近发现有些解决方案都是用来nuget包,这个偶尔能跑,但是有一个爱抽风的毛病,生成解决方案的时候报错:无法连接到远程服务器,真几把蛋疼.... 就是下图的情况 网上找了下不是很容易找到处理这个问题的 ...
- 剑指offer-第七章面试案例2(树中两个节点的公共祖先节点)
import java.util.LinkedList; import java.util.Queue; import java.util.Stack; //树中两个节点的最低公共祖先 //第一种情况 ...
- 剑指Offer-第一章面试细节总结
面试细节:行为面试(20%)+技术面试(70%)+应聘者提问(10%) * 行为面试:跳槽者(不要抱怨老板,不要抱怨同事,只为追寻自己的理想而站斗) * 技术面试:1.基础知识点(编程语言,数据结构( ...
- 【java规则引擎】drools6.5.0版本中kmodule.xml解析
kmodule.xml文件存放在src/main/resources/META-INF/文件夹下. <?xml version="1.0" encoding="UT ...
- 使用ajax提交form表单,formData
http://www.cnblogs.com/zhuxiaojie/p/4783939.html formDatahttps://developer.mozilla.org/zh-CN/docs/We ...
- PS相关技术
PS相关长时间不用就忘记了,做个笔记,记录下来 (1)复制图层,可以将图层复制到另外的图层里去,这样,多个图层就可以编辑了 (2)通过建立选区,可以选择右键,通过剪切的图层,通过复制的图层将图片抠出来 ...