Introduction

The problem of searching for patterns in data is a fundamental one and has a long and successful history. For instance, the extensive astronomical observations of Tycho Brahe in the 16th century allowed Johannes Kepler to discover the empirical laws of planetary motion, which in turn provided a springboard for the development of classical mechanics. Similarly, the discovery of regularities in atomic spectra played a key role in the development and verification of quantum physics in the early twentieth century. The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying the data into different categories.

简介

在数据中寻找patterns是一个基本问题,并有着悠久辉煌的历史。例如,16世纪第谷大量的天文观测让开普勒发现了行星运行的规律,这反过来又为经典力学的发展提供了一个跳板。类似,原子光谱规律的发现对20世纪早期量子物理的发展和验证起到了关键作用。模式识别领域关注的是在数据中通过使用计算机算法自动发现规律并用这些规律做下一步行动,比如把数据分为不同的类别。

Consider the example of recognizing handwritten digits, illustrated in Figure 1.1. Each digit corresponds to a 28×28 pixel image and so can be represented by a vector x comprising 784 real numbers. The goal is to build a machine that will take such a vector x as input and that will produce the identity of the digit 0,..., 9 as the output. This is a nontrivial problem due to the wide variability of handwriting. It could be tackled using handcrafted rules or heuristics for distinguishing the digits based on the shapes of the strokes, but in practice such an approach leads to a proliferation of rules and of exceptions to the rules and so on, and invariably gives poor results.

                                                                    

Figure 1.1

考虑一个手写数字识别的例子,如图1.1所示。每一个数字对应于一个28×28像素的图片,所以可以用一个784维的向量X表示。我们的目标是构建一个机器,以向量X作为输入,输出是数字0,..., 9 的识别。这是一个困难的问题,因为手写笔记是变化很大的。基于笔画的形状,这个问题可以通过手写的规则和启发式的学习区别数字来解决,但实际应用中,这种方法会导致规则的迅速增多、规则的例外等问题,最后给出糟糕的结果。

Far better results can be obtained by adopting a machine learning approach in which a large set of N  digits {x1 ,..., xN } called a training set is used to tune the parameters of an adaptive model.  The categories of the digits in the training set are known in advance, typically by inspecting them individually and hand-labelling

them. We can express the category of a digit using target vector t, which represents the identity of the corresponding digit.  Suitable techniques for representing cate- gories in terms of vectors will be discussed later. Note that there is one such target vector t for each digit image x.

The result of running the machine learning algorithm can be expressed as a function y(x) which takes a new digit image x as input and that generates an output

vector y, encoded in the same way as the target vectors.  The precise form of the function y(x) is determined during the training phase, also known as the learning phase, on the basis of the training data.  Once the model is trained it can then de- termine the identity of new digit images, which are said to comprise a test set. The ability to categorize correctly new examples that differ from those used for train- ing is known as generalization. In practical applications, the variability of the input vectors will be such that the training data can comprise only a tiny fraction of all possible input vectors, and so generalization is a central goal in pattern recognition.

For most practical applications, the original input variables are typically prepro- cessed to transform them into some new space of variables where, it is hoped, the pattern recognition problem will be easier to solve. For instance, in the digit recogni- tion problem, the images of the digits are typically translated and scaled so that each digit is contained within a box of a fixed size.  This greatly reduces the variability within each digit class, because the location and scale of all the digits are now the same, which makes it much easier for a subsequent pattern recognition algorithm to distinguish between the different classes. This pre-processing stage is sometimes also called feature extraction. Note that new test data must be pre-processed using the same steps as the training data.

Pre-processing might also be performed in order to speed up computation. For example, if the goal is real-time face detection in a high-resolution video stream, the computer must handle huge numbers of pixels per second, and presenting these directly to a complex pattern recognition algorithm may be computationally infeasi- ble. Instead, the aim is to find useful features that are fast to compute, and yet that

also preserve useful discriminatory information enabling faces to be distinguished from non-faces. These features are then used as the inputs to the pattern recognition algorithm. For instance, the average value of the image intensity over a rectangular subregion can be evaluated extremely efficiently (Viola and Jones, 2004), and a set of such features can prove very effective in fast face detection. Because the number of such features is smaller than the number of pixels, this kind of pre-processing repre- sents a form of dimensionality reduction. Care must be taken during pre-processing because often information is discarded, and if this information is important to the solution of the problem then the overall accuracy of the system can suffer.

Applications in which the training data comprises examples of the input vectors along with their corresponding target vectors are known as supervised learning prob- lems. Cases such as the digit recognition example, in which the aim is to assign each input vector to one of a finite number of discrete categories, are called classification problems. If the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem would be the pre- diction of the yield in a chemical manufacturing process in which the inputs consist of the concentrations of reactants, the temperature, and the pressure.

In other pattern recognition problems, the training data consists of a set of input vectors x without any corresponding target values. The goal in such unsupervised learning problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization.

Finally, the technique of reinforcement learning (Sutton and Barto, 1998) is con- cerned with the problem of finding suitable actions to take in a given situation in order to maximize a reward. Here the learning algorithm is not given examples of optimal outputs, in contrast to supervised learning, but must instead discover them by a process of trial and error. Typically there is a sequence of states and actions in which the learning algorithm is interacting with its environment. In many cases, the current action not only affects the immediate reward but also has an impact on the re- ward at all subsequent time steps. For example, by using appropriate reinforcement learning techniques a neural network can learn to play the game of backgammon to a high standard (Tesauro, 1994). Here the network must learn to take a board position as input, along with the result of a dice throw, and produce a strong move as the output. This is done by having the network play against a copy of itself for perhaps a million games. A major challenge is that a game of backgammon can involve dozens of moves, and yet it is only at the end of the game that the reward, in the form of victory, is achieved. The reward must then be attributed appropriately to all of the moves that led to it, even though some moves will have been good ones and others less so. This is an example of a credit assignment problem. A general feature of re- inforcement learning is the trade-off between exploration, in which the system tries out new kinds of actions to see how effective they are, and exploitation, in which the system makes use of actions that are known to yield a high reward. Too strong a focus on either exploration or exploitation will yield poor results. Reinforcement learning continues to be an active area of machine learning research.  However, a detailed treatment lies beyond the scope of this book.

Although each of these tasks needs its own tools and techniques, many of the key ideas that underpin them are common to all such problems.  One of the main goals of this chapter is to introduce, in a relatively informal way, several of the most important of these concepts and to illustrate them using simple examples. Later in the book we shall see these same ideas re-emerge in the context of more sophisti- cated models that are applicable to real-world pattern recognition applications. This chapter also provides a self-contained introduction to three important tools that will be used throughout the book, namely probability theory, decision theory, and infor- mation theory.  Although these might sound like daunting topics, they are in fact straightforward, and a clear understanding of them is essential if machine learning techniques are to be used to best effect in practical applications.

Pattern Recognition and Machine Learning-02-1.0-Introduction的更多相关文章

  1. Pattern Recognition And Machine Learning读书会前言

    读书会成立属于偶然,一次群里无聊到极点,有人说Pattern Recognition And Machine Learning这本书不错,加之有好友之前推荐过,便发了封群邮件组织这个读书会,采用轮流讲 ...

  2. Pattern Recognition and Machine Learning (preface translation)

    前言 鉴于机器学习产生自计算机科学,模式识别却起源于工程学.然而,这些活动能被看做同一个领域的两个方面,并且他们同时在这过去的十年间经历了本质上的发展.特别是,当图像模型已经作为一个用来描述和应用概率 ...

  3. Pattern recognition and machine learning 疑难处汇总

    不断更新ing......... p141 para 1. 当一个x对应的t值不止一个时,Gaussian nosie assumption就不合适了.因为Gaussian 是unimodal的,这意 ...

  4. Pattern Recognition And Machine Learning (模式识别与机器学习) 笔记 (1)

    By Yunduan Cui 这是我自己的PRML学习笔记,目前持续更新中. 第二章 Probability Distributions 概率分布 本章介绍了书中要用到的概率分布模型,是之后章节的基础 ...

  5. 学习笔记-----《Pattern Recognition and Machine Learning》Christopher M. Bishop

    Preface 模式识别这个词,以前一直不懂是什么意思,直到今年初,才开始打算读这本广为推荐的书,初步了解到,它的大致意思是从数据中发现特征,规律,属于机器学习的一个分支. 在前言中,阐述了什么是模式 ...

  6. 今天开始学Pattern Recognition and Machine Learning (PRML),章节5.2-5.3,Neural Networks神经网络训练(BP算法)

    转载请注明出处:http://www.cnblogs.com/xbinworld/p/4265530.html 这一篇是整个第五章的精华了,会重点介绍一下Neural Networks的训练方法——反 ...

  7. 今天开始学习模式识别与机器学习Pattern Recognition and Machine Learning (PRML),章节5.1,Neural Networks神经网络-前向网络。

    话说上一次写这个笔记是13年的事情了···那时候忙着实习,找工作,毕业什么的就没写下去了,现在工作了有半年时间也算稳定了,我会继续把这个笔记写完.其实很多章节都看了,不过还没写出来,先从第5章开始吧, ...

  8. Pattern Recognition and Machine Learning 模式识别与机器学习

    模式识别(PR)领域:     关注的是利⽤计算机算法⾃动发现数据中的规律,以及使⽤这些规律采取将数据分类等⾏动. 聚类:目标是发现数据中相似样本的分组. 反馈学习:是在给定的条件下,找到合适的动作, ...

  9. Pattern Recognition and Machine Learning-01-Preface

    Preface Pattern recognition has its origins in engineering, whereas machine learning grew out of com ...

随机推荐

  1. 【java】[sql]使用Java程序向MySql数据库插入一千万条记录,各种方式的比较,最后发现insert批量插入方式对效率提升最明显

    我的数据库环境是mysql Ver 14.14 Distrib 5.6.45, for Linux (x86_64) using EditLine wrapper 这个数据库是安装在T440p的虚拟机 ...

  2. rc.local 注意事項,call python script, file position

    如果要在 rc.local 呼叫 python script python script 的位置需使用絕對路徑 其 python script 裡的有關 file 的位置也需使用 絕對路徑 如果要在 ...

  3. [转][C#]AutoFac 使用方法总结

    AutoFac使用方法总结:Part I 转自:http://niuyi.github.io/blog/2012/04/06/autofac-by-unit-test/ AutoFac是.net平台下 ...

  4. Qemu搭建ARM vexpress开发环境(一)

    Qemu搭建ARM vexpress开发环境(一) 标签(空格分隔): Qemu ARM Linux 嵌入式开发离不开硬件设备比如:开发板.外设等,但是如果只是想学习研究Linux内核,想学习Linu ...

  5. object_detection faster-rcnn

    (t20190518) luo@luo-All-Series:~/MyFile/TensorflowProject/Faster_RCNN/models/research$ (t20190518) l ...

  6. postgreSQL 之 Privilege & grant & revoke(未完待续)

    When an object is created, it is assigned an owner. The owner is normally the role that executed the ...

  7. PAT 甲级 1013 Battle Over Cities (25 分)(图的遍历,统计强连通分量个数,bfs,一遍就ac啦)

    1013 Battle Over Cities (25 分)   It is vitally important to have all the cities connected by highway ...

  8. 【ASP.NET Core学习】远程过程调用 - gRPC使用

    本文介绍在gRPC使用,将从下面几个方面介绍 什么是RPC 什么时候需要RPC 如何使用gRPC 什么是RPC RPC是Remote Procedure Call简称,翻译过来是远程过程调用.它是一个 ...

  9. haproxy报错解决

    .有一次访问出现 错误 http://192.168.0.200:10080 haproxy service unavailable no server is avaible to handle th ...

  10. lexicalized Parsing

    $q$(S $\rightarrow$ NP VP) * $q$(NP $\rightarrow$ NNP) * $q$(VP $\rightarrow$ VB NP) * $q$(NP $\righ ...