Introduction - Supervised Learning

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程，第一章《绪论：初识机器学习》中第3课时《监督学习》的视频原文字幕。为本人在视频学习过程中逐字逐句记录下来以便日后查阅使用。现分享给大家。如有错误，欢迎大家批评指正，在此表示诚挚地感谢！同时希望对大家的学习能有所帮助。

In this video (article) I am going to define what is probably the most common type of machine learning problem, which is supervised learning. I'll define supervised learning more formally later, but it's probably best to explain or start with an example of what it is, and we'll do formal definition later.

Let's say you want to predict housing prices. A while back, a student collected data set from the city of Portland, Oregan. And let's say you plot a data set and it looks like this. Here on the horizontal axis, the size of different houses in square feet, and on the vertical axis, the price of different houses in thousands of dollars. So, given this data, let's say you have a friend who owns a house that is, say 750 square feet, and hoping to sell the house, and they want to know how much they can get for the house. So how can the learning algorithm help you? One thing a learning algorithm might be able to do is put a straight line through the data or to fit a straight line to the data, and based on that, it looks like maybe the house can be sold for maybe about \$150,000. But maybe this isn't the only learning algorithm you can use. There might be a better one. For example, instead of fitting a straight line to the data, we might decide that it's better to fit a quadratic function or a second-order polynomial to this data (blue line). And if you do that, and make a prediction here, then it looks like, maybe they can sell the house for closer to \$200,000. Onelooks like maybe the house can be sold for maybe about \$150,000. But maybe this isn't the only learning algorithm you can use. There might be a better one. For example, instead of fitting a straight line to the data, we might decide that it's better to fit a quadratic function or a second-order polynomial to this data (blue line). And if you do that, and make a prediction here, then it looks like, maybe they can sell the house for closer to $200,000 of the things we'll talk about later is how to choose and how to decide do you want to fit a straight line to the data or do you want to fit a quadratic function to the data. And there's no fair picking whichever one gives your friend the better house to sell. But each of these would be a fine example of a learning algorithm. So, this is an example of a supervised learning algorithm. And the term supervised learning refers to the fact that we gave the algorithm a data set, in which the "right answers" were given. That is, we gave it a data set of houses in which for every example in this data set, we told it what is the right price, or what is the actual price that that house sold for, and the task of the algorithm was to just produce more of these right answers such as for this new house that your friend may be trying to sell. To define with a bit more terminology, this is also called a regression problem. And by regression problem I mean we're trying to predict a continuous valued output, namely the price. So technically I guess prices can be rounded off to the nearest cent. So maybe prices are actually discrete values, but usually we think of the price of a house as a real number as a scalar value, as a continuous value number. And the term regression refers to the fact that we're trying to predict the sort of continuous values attribute.

Here's another supervised learning example, some friends and I were actually working on this earlier. Let's say you want to look at medical records and try to predict of a breast cancer as malignant or benign. If someone discovers a breast tumor, a lump in their breast, a malignant tumor is a tumor that is harmful and dangerous, and a benign tumor is a tumor that is harmless. So obviously people care a lot about this. Let's see a collected data set. And suppose in your data set you have on your horizontal axis the size of the tumor and on the vertical axis I'm going to plot one or zero, yes or no, whether or not these are examples of tumors we've seen before are malignant which is one, or zero if not malignant or benign. So, let's say our data set looks like this where we saw a tumor of this size that turned out to be benign. One of this size, one of this size, and so on (all blue x). And sadly, we also saw a few malignant tumors, one of that size, one of that size and so on (all red x). So, this example, I have five examples of benign tumors shown down here, and five examples of malignant tumors shown with a vertical axis value of one. And let's say we have a friend who tragically has a breast tumor, and let's say her breast tumor size is maybe somewhere around this value. The machine learning question is, can you estimate what is the probability, what is the chance that a tumor is malignant versus benign? To introduce a bit more terminology, this is an example of a classification problem. The term classification refers to the fact that here we're trying to predict a discrete valued output zero or one, malignant or benign. And it turns out that in classification problem, sometimes you can have more than two possible values for the output. As a concrete example, maybe there are three types of breast cancers, and so you may try to predict the discrete values of zero, one, two or three, with zero being benign. Benign tumor, so no cancer. And one may mean, type one cancer, like, you have three types of cancer, whatever type one means. And two may mean a second type of cancer, and three may mean a third type of cancer. But this would also be a classification problem because there's another discrete value set of output, corresponding to, you know, no cancer, or cancer type one, or cancer type two, or cancer type three. In classification problem there is another way to plot this data. Let me show what I mean. Let me use a slightly different set of symbols to plot this data. So, if tumor size is going to be the attribute that I'm going to use to predict malignancy or benignness, I can also draw my data like this. I'm going to use different symbols to denote my benign and malignant, or my negative and positive examples. So instead of drawing crosses, I'm now going to draw O's for the benign tumors. Like so. And I'm going to keep using X's to denote my malignant tumors. Okay? I hope this figure make sense. All I did was I took my data set on top and I just mapped it down to this real line like so. And started to use different symbols, circles and crosses, to denote malignant versus benign examples. Now, in this example we use only one feature or one attribute, namely, the tumor size, in order to predict whether the tumor is malignant or benign. In other machine learning problems, we maybe have more than one feature, more than one attribute. Here's an example.

Let's say that instead of just knowing the tumor size, we know both the age of the patients and the tumor size. In that case maybe our data set will look like this, where I may have a set of patients with those ages and that tumor size and look like this. And a different set of patients, they look a little different, whose tumors turn out to be malignant, as denoted by the crosses. So, let's say you have a friend who tragically has a tumor. And maybe, their tumor size and age fall around there. So, given a data set like this, what the learning algorithm might do is fit a straight line through the data to try to separate out the malignant tumors from the benign ones. And so, the learning algorithm may decide to throw the straight line like that to separate out the two classes of tumors. With this, hopefully you can decide that your friend's tumor is more likely so if it is over there, that hopefully your learning algorithm will say that your friend's tumor falls on this benign side and is therefore more likely to be benign than malignant. In this example we had two features, namely, the age of the patient and the size of the tumor. In other machine learning problems, we will often have more features. And my friends that work on this problem, they use other features like these, which is the clump thickness of the breast tumor, uniformity of cell size of the tumor, uniformity of cell shape of the tumor, and so on, and other features as well. And it turns out one of the most interesting learning algorithms that we'll see in this class is a learning algorithm that can deal with, not just two or three or five features, but an infinite number of features. On this slide, I've listed a total of five different features. Right, two on the axes and three more up here. But it turns out that for some learning problems, what you really want is not to use, like, three or five features. But instead, you want to use an infinite number of features, an infinite number of attributes, so that your learning algorithm has lots of attributes or features or cues with which to make those predictions. So how do you deal with an infinite number of features. How do you even store an infinite number of things on the computer when your computer is gonna run out of memory. It turns out that when we talk about an algorithm called the Support Vector Machine, there will be a neat mathematical trick that will allow a computer to deal with an infinite number of features. Imagine that I didn't just write down two features here and three features on the right. But imagine that I wrote down an infinitely long list, I just kept writing more and more features. Like an infinitely long list of features. Turns out, we'll be able to come up with an algorithm that can deal with that. So, just to recap. In this class we'll talk about supervised learning. And the idea is that, in supervised learning, in every example in our data set, we are told what is the "correct answer" that we would have quite liked the algorithms have predicted on that example. Such as the price of the house, or whether a tumor is malignant or benign. We also talked about the regression problem. And by regression, that means that our goal is to predict a continuous valued output. And we talked about the classification problem, where the goal is to predict a discrete valued output.

Just a quick wrap up question: suppose you're running a company, and you want to develop learning algorithms to address each of two problems. In the first problem, you have a large inventory of identical items. So, imagine that you have thousands of copies of some identical items to sell and you want to predict how many of these items you sell within the next three months. In the second problem, problem two, you have lots of users, and you want to write a software to examine each individual of your customer's accounts, and for each account, decide whether or not the account has been hacked or compromised. So, for each of these problems, should be they be treated as a classification problem, or as regression problem? When the video pauses, please use your mouse to select whichever of these four options on the left you think is the correct answer. So hopefully, you got that this is the answer. For problem one, I would treat this as a regression problem, because if I have, you know, thousands of items, well, I would probably just treat this as a real value, as a continuous value. And treat, therefore, the number of items I sell, as a continuous value. And for the second problem, I would treat that as a classification problem, because I might say, set the value I want to predict with zero, to denote the account has not been hacked. And set the value one to denote an account that has been hacked into. So just like, you know, breast cancer, is, zero is benign, one is malignant. So, I might set this be zero or one depending on whether it's been hacked, and have an algorithm try to predict each one of these two discrete values. And because there's a small number of discrete values, I would therefore treat it as a classification problem.

So, that's it for supervised learning, and in the next video (article) I'll talk about unsupervised learning which is the other major category of learning algorithms.

<end>

Introduction - Supervised Learning的更多相关文章

A brief introduction to weakly supervised learning（简要介绍弱监督学习）
by 南大周志华摘要监督学习技术通过学习大量训练数据来构建预测模型,其中每个训练样本都有其对应的真值输出.尽管现有的技术已经取得了巨大的成功,但值得注意的是,由于数据标注过程的高成本,很多任务很难 ...
Machine Learning Algorithms Study Notes(2)--Supervised Learning
Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...
[转]Introduction to Learning to Trade with Reinforcement Learning
Introduction to Learning to Trade with Reinforcement Learning http://www.wildml.com/2018/02/introduc ...
Introduction to Learning to Trade with Reinforcement Learning
http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/ The academic ...
A Brief Review of Supervised Learning
There are a number of algorithms that are typically used for system identification, adaptive control ...
Supervised Learning and Unsupervised Learning
Supervised Learning In supervised learning, we are given a data set and already know what our correc ...
监督学习Supervised Learning
In supervised learning, we are given a data set and already know what our correct output should look ...
学习笔记之Supervised Learning with scikit-learn | DataCamp
Supervised Learning with scikit-learn | DataCamp https://www.datacamp.com/courses/supervised-learnin ...
（转载）[机器学习] Coursera ML笔记 - 监督学习（Supervised Learning） - Representation
[机器学习] Coursera ML笔记 - 监督学习(Supervised Learning) - Representation http://blog.csdn.net/walilk/articl ...

随机推荐

intelij idea 常用插件下载
本文,给大家推荐几款我私藏已久的,自己经常使用的,可以提升代码效率的插件.IDEA插件简介常见的IDEA插件主要有如下几类:常用工具支持Java日常开发需要接触到很多常用的工具,为了便于使用,很多工具 ...
[2019HDU多校第二场][HDU 6591][A. Another Chess Problem]
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=6591 题目大意:二维坐标系上,所有满足$5|2x+y$的点都被设为障碍物,无法通过.现给出一对点, ...
BZOJ 4013/Luogu P3240 [HNOI2015] 实验比较（树形DP）
题目传送门分析放一个dalao博客: xyz32768 的博客,看完再回来看本蒟蒻的口胡吧(其实嘛-不回来也行) 精髓是合并的方案数的计算,至于为什么是Ci−1j−1\large C_{i-1}^ ...
将vim打造成python开发工具
1.创建vim插件工作目录 [root@ray ~]# mkdir -p ~/.vim/bundle 2.下载插件 [root@ray ~]# cd ~/.vim/bundle [root@ray b ...
sql 临时表的使用详解
Sqlserver数据库中的临时表详解临时表在Sqlserver数据库中,是非常重要的,下面就详细介绍SQL数据库中临时表的特点及其使用,仅供参考. 临时表与永久表相似,但临时表存储在tem ...
ActiveMQ C#实例
重点参考:NMS Documentation 一.ActiveMQ Queue 在ActiveMQ中Queue是一种点对点的消息分发方式,生产者在队列中添加一条消息,然后消费者消费一条消息,这条消息保 ...
python 批量创建文件及文件夹（文件夹里再创文件）
python 批量创建文件及文件夹(文件夹里再创文件)思路:文件建到哪>文件名字叫啥>创建文件夹>去新建的文件下>新建文件>给文件里边写东西>写个反馈给控制台> ...
MySQL数据分析实战-朱元禄-专题视频课程
MySQL数据分析实战-496人已学习课程介绍本套课程由知名数据分析博主jacky老师录制,深入浅出讲解MySQL数据分析,从实战角度出发,帮助大家制胜职场!课程收益 1.学会 ...
虚拟机安装WIN7教程
1.去下载win7原装镜像,推荐去官方网站下载:https://msdn.itellyou.cn/ 也可以直接使用Win7系统和激活工具链接:https://pan.baidu.com/s/1SJSE ...
webpack - 优化阻塞渲染的css
随着浏览器的日新月异,网页的性能和速度越来越好,并且对于用户体验来说也越来越重要. 现在有很多优化页面的办法,比如:静态资源的合并和压缩,code splitting,DNS预读取等等. 本文介绍的是 ...

Introduction - Supervised Learning

Introduction - Supervised Learning的更多相关文章

随机推荐

热门专题