


online machine learning is a model of induction that learns one instance at a time. The goal in on-line learning is to predict labels for instances. For example, the instances could describe the current conditions of thestock
, and an online algorithm predicts tomorrow's value of a particular stock. The key defining characteristic of on-line learning is that soon after the prediction is made, the true label of the instance is discovered. This information can then be used
to refine the prediction hypothesis used by the algorithm. The goal of the algorithm is to make predictions that are close to the true labels.


More formally, an online algorithm proceeds in a sequence of trials. Each trial can be decomposed into three steps. First the algorithm receives an instance. Second the algorithm predicts the label of the instance. Third the algorithm receives the true label
of the instance.[1] The third stage is the most crucial as the algorithm can
use this label feedback to update its hypothesis for future trials.     The goal of the algorithm is to minimize some performance criteria. For example, with stock market prediction the algorithm may attempt to minimize sum of the square distances between
the predicted and true value of a stock. Another popular performance criterion is to minimize the number of mistakes when dealing with classification problems. In addition to applications of a sequential nature, online learning algorithms are also relevant
in applications with huge amounts of data such that traditional learning approaches that use the entire data set in aggregate are computationally infeasible.

更一般化的说,在线学习算法有一些列方法,每种方法都可分解为以下几步:首先,算法接受一个实例;接着算法预测实例的标签;第三 算法接受实例的真实标签(有正确和错误之分,根据结果来调整算法)。第三步比较重要,因为算法根据标签反馈来更新算法参数,来更新未来试验预测的假设。  算法的目的是最小化某些性能标准(?)。例如,在股票市场,算法尝试最小化股票预测和现实股票真实值的偏差(整个模型是动态的)。可以用来处理那些数据量太大而计算能力不能一次性处理整个训练集的情况。(有没有觉得像人的学习过程,一个一个,而不是简单背规则,然后错了就错了)

Because on-line learning algorithms continually receive label feedback, the algorithms are able to adapt and learn in difficult situations. Many online algorithms can give strong guarantees on performance even when the instances are not generated by a distribution.
As long as a reasonably good classifier exists, the online algorithm will learn to predict correct labels. This good classifier must come from a previously determined set that depends on the algorithm. For example, two popular on-line algorithmsperceptron
and winnow can perform well when a hyperplane exists that splits the data into two categories. These algorithms can even be modified to do provably well even if the hyperplane is allowed to infrequently change during the on-line learning trials.


Unfortunately, the main difficulty of on-line learning is also a result of the requirement for continual label feedback. For many problems it is not possible to guarantee that accurate label feedback will be available in the near future. For example, when
designing a system that learns how to do optical character recognition, typically some expert must label previous instances to help train the algorithm. In actual use of the OCR application, the expert is no longer available and no inexpensive outside source of accurate labels is available. Fortunately,
there is a large class of problems where label feedback is always available. For any problem that consists of predicting the future, an on-line learning algorithm just needs to wait for the label to become available. This is true in our previous example of
stock market prediction and many other problems.


一个典型的在线监督学习算法:A prototypical online supervised learning algorithm

In the setting of supervised learning, or learning from examples, we are interested in learning a function, where
is thought of as a space of inputs and as a space of outputs, that predicts well on instances that are drawn from a joint probability distribution
on. In this setting, we are given aloss
, such that
measures the difference between the predicted value and the true value.
The ideal goal is to select a function, where
is a space of functions called a hypothesis space, so as to minimize the expected risk:

In reality, the learner never knows the true distribution over instances. Instead, the learner usually has access to a training set of examples
that are assumed to have been drawni.i.d. from the true distribution.
A common paradigm in this situation is to estimate a function throughempirical
risk minimization
or regularized empirical risk minimization (usuallyTikhonov regularization). The choice of loss function
here gives rise to several well-known learning algorithms such as regularizedleast squares andsupport
vector machines

The above paradigm is not well-suited to the online learning setting though, as it requires complete a priori knowledge of the entire training set. In the pure online learning approach, the learning algorithm should update a sequence of functions
in a way such that the function depends only on the previous function
and the next data point. This approach has low memory requirements in the sense that it only requires storage of a representation of the current
function and the next data point.
A related approach that has larger memory requirements allows to depend on
and all previous data points. We focus solely on the former approach here, and we consider both the case where the data
is coming from an infinite stream and the case where the data is coming from a finite training set,
in which case the online learning algorithm may make multiple passes through the data.




机器学习:需要从已知的数据 学习出需要的模型;



1. 收集和学习现有的数据;

2. 依据模型或规则,做出决策,给出结果;

3. 根据真实的结果,来训练和学习规则或模型。


Perceptron: 感知器

PA: passive Perceptron



Voted Perceptron

confidence-weighted linear linear classification: 基于置信度加权的线性分类器

Weight Majority algorithm

AROW:adaptive regularization of weighted vector :加权向量的自适应正则化

"NHERD":Normal Herd  正态解群





平均感知器Average Perceptron:



Passive Aggressive Perceptron:



Tt 有三种计算方法:

a. Tt =  lt / (||Xt||^2)

b. Tt =  min{C, lt / ||Xt||^2}

c.  Tt =  lt / (||Xt||^2 + 1/(2C))

分别对应PA, PA-I, PA-II 算法,三种类型。

Voted Perceptron:




Confidence Weight:(线性分类器)


权值w符合高斯分布N(u, 离差阵),而 由w*x的结果,可以预测其分类的结果。



AROW: adaptive regularition of weighted vector

具有的属性:大间隔训练large margin training,置信度权值confidence weight,处理不可分数据(噪声)non-separable

相对于SOP(second of Perceptron),PA, CW, 在噪声情况下,其效果会更好.

Normal herding: (线性分类器)


Weight Majority:




Voted Perceptron:




以上Perceptron, PA, CW, AROW, NHerd都是Jubatus分布式在线机器学习 框架能提供的算法。







