Theano3.3-练习之逻辑回归

是官网上theano的逻辑回归的练习（http://deeplearning.net/tutorial/logreg.html#logreg）的讲解。

Classifying MNIST digits using Logistic Regression

note：这部分假设你已经熟悉了这几个theano概念：: shared
variables , basic
arithmetic ops , T.grad , floatX.。如果你想要在GPU上运行这个代码，同样可以读读GPU.

note：这部分的代码下载地址 here.

在这部分中，我们会介绍如何使用theano来执行最基础的分类器：逻辑回归。我们先快速的简单介绍下这个模型，这算是一个复习或者说是为了加深理解，并且用来说明如何将数学表达式映射到theano图（graphs）上。在最底层的机器学习传统基础上，本教程来解决MNIST数字分类的最令人激动人心的问题。

一、模型

逻辑回归模型是个概率、线性分类器。它是通过权重矩阵和一个偏置向量来参数化的。通过将输入向量映射到一个超平面集合上来达到分类的目的，每一个超平面都对应着不同的类别。从输入到一个超平面之间的距离反映出该输入是对应的类别成员的概率的大小。

数学上来说，一个输入向量是一个类别的成员的概率是随机变量
的一个值，可以写成如下形式：

模型的预测结果表示的类别是它的概率是最大的时候，也就是：

在theano中的代码如下：

 # initialize with 0 the weights W as a matrix of shape (n_in, n_out)将权重矩阵W的值初始化为0，floatX类别

        self.W = theano.shared(

            value=numpy.zeros(

                (n_in, n_out),

                dtype=theano.config.floatX

            ),

            name='W',

            borrow=True

        )

        # initialize the baises b as a vector of n_out 0s#初始化偏置向量为0，floatX类型

        self.b = theano.shared(

            value=numpy.zeros(

                (n_out,),

                dtype=theano.config.floatX

            ),

            name='b',

            borrow=True

        )

        # symbolic expression for computing the matrix of class-membership

        # probabilities

        # Where:

        # W is a matrix where column-k represent the separation hyper plain for

        # class-k

        # x is a matrix where row-j  represents input training sample-j

        # b is a vector where element-k represent the free parameter of hyper

        # plain-k

        self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)#直接调用softmax计算输入、权重、偏置的结果

        # symbolic description of how to compute prediction as class whose

        # probability is maximal

        self.y_pred = T.argmax(self.p_y_given_x, axis=1)#读取得到结果的最大值，即softmax得到的最大概率

因为模型的参数在训练中必须保持一个稳定的状态，我们需要为分配共享变量。这样不但是将他们声明为符号式的theano变量，而且同样对它们进行了初始化。点和softmax操作随后用来计算向量。结果p_y_given_x是一个向量类型的符号变量。为了得到实际模型的预测结果，可以使用T.argmax操作，这个用来返回在p_y_given_x中的最大值（即最大概率的那个类）。现在，该模型还没有任何用处，因为它的参数仍然在初始化的状态，下面的部分将会介绍如何来学习优化这些参数

note：theano的操作的完整列表：list
of ops

二、定义一个损失函数

学习一个最优化模型的参数，需要涉及到最小化一个损失函数。在多类逻辑回归中，通常是使用负log似然作为损失函数。也就是等于在基于由参数化模型后，最大化数据集的似然函数。让我们首先定义下似然函数和损失函数：

然而整本书都在说最小化的主题，到目前为止梯度下降算是最简单的方法用来最小化任意非线性函数了。该教程将会使用minibatch
sgd的方法。更详细的见Stochastic
Gradient Descent 。

下面的theano代码是在给定minibatch的基础上定义的损失（符号）函数：

        # y.shape[0] is (symbolically) the number of rows in y, i.e.,

        # number of examples (call it n) in the minibatch

        # T.arange(y.shape[0]) is a symbolic vector which will contain

        # [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of

        # Log-Probabilities (call it LP) with one row per example and

        # one column per class LP[T.arange(y.shape[0]),y] is a vector

        # v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,

        # LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is

        # the mean (across minibatch examples) of the elements in v,

        # i.e., the mean log-likelihood across the minibatch.

        return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])

note：即使损失函数在形式上是定义成基于数据集的独立误差项的和。不过在实际操作中，在代码中是通过使用均值（T.mean）来实现的。这使得我们对于学习率的选择能更少的依赖于minibatch
size。

三、创建一个逻辑回归类

我们现在有所有我们需要定义一个LogisticRegression类的工具。可以用一个类来封装逻辑回归的基本操作。这里的代码很类似于到目前为止看到的代码，所以可以自圆其说：

class LogisticRegression(object):

    """Multi-class Logistic Regression Class

    The logistic regression is fully described by a weight matrix :math:`W`

    and bias vector :math:`b`. Classification is done by projecting data

    points onto a set of hyperplanes, the distance to which is used to

    determine a class membership probability.

    """

    def __init__(self, input, n_in, n_out):

        """ Initialize the parameters of the logistic regression

        :type input: theano.tensor.TensorType

        :param input: symbolic variable that describes the input of the

                      architecture (one minibatch)

        :type n_in: int

        :param n_in: number of input units, the dimension of the space in

                     which the datapoints lie

        :type n_out: int

        :param n_out: number of output units, the dimension of the space in

                      which the labels lie

        """

        # start-snippet-1

        # initialize with 0 the weights W as a matrix of shape (n_in, n_out)

        self.W = theano.shared(

            value=numpy.zeros(

                (n_in, n_out),

                dtype=theano.config.floatX

            ),

            name='W',

            borrow=True

        )

        # initialize the baises b as a vector of n_out 0s

        self.b = theano.shared(

            value=numpy.zeros(

                (n_out,),

                dtype=theano.config.floatX

            ),

            name='b',

            borrow=True

        )

        # symbolic expression for computing the matrix of class-membership

        # probabilities

        # Where:

        # W is a matrix where column-k represent the separation hyper plain for

        # class-k

        # x is a matrix where row-j  represents input training sample-j

        # b is a vector where element-k represent the free parameter of hyper

        # plain-k

        self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)

        # symbolic description of how to compute prediction as class whose

        # probability is maximal

        self.y_pred = T.argmax(self.p_y_given_x, axis=1)

        # end-snippet-1

        # parameters of the model

        self.params = [self.W, self.b]

    def negative_log_likelihood(self, y):

        """Return the mean of the negative log-likelihood of the prediction

        of this model under a given target distribution.

        .. math::

            \frac{1}{|\mathcal{D}|} \mathcal{L} (\theta=\{W,b\}, \mathcal{D}) =

            \frac{1}{|\mathcal{D}|} \sum_{i=0}^{|\mathcal{D}|}

                \log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\

            \ell (\theta=\{W,b\}, \mathcal{D})

        :type y: theano.tensor.TensorType

        :param y: corresponds to a vector that gives for each example the

                  correct label

        Note: we use the mean instead of the sum so that

              the learning rate is less dependent on the batch size

        """

        # start-snippet-2

        # y.shape[0] is (symbolically) the number of rows in y, i.e.,

        # number of examples (call it n) in the minibatch

        # T.arange(y.shape[0]) is a symbolic vector which will contain

        # [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of

        # Log-Probabilities (call it LP) with one row per example and

        # one column per class LP[T.arange(y.shape[0]),y] is a vector

        # v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,

        # LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is

        # the mean (across minibatch examples) of the elements in v,

        # i.e., the mean log-likelihood across the minibatch.

        return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])

        # end-snippet-2

    def errors(self, y):

        """Return a float representing the number of errors in the minibatch

        over the total number of examples of the minibatch ; zero one

        loss over the size of the minibatch

        :type y: theano.tensor.TensorType

        :param y: corresponds to a vector that gives for each example the

                  correct label

        """

        # check if y has same dimension of y_pred

        if y.ndim != self.y_pred.ndim:

            raise TypeError(

                'y should have the same shape as self.y_pred',

                ('y', y.type, 'y_pred', self.y_pred.type)

            )

        # check if y is of the correct datatype

        if y.dtype.startswith('int'):

            # the T.neq operator returns a vector of 0s and 1s, where 1

            # represents a mistake in prediction

            return T.mean(T.neq(self.y_pred, y))

        else:

            raise NotImplementedError()

可以通过下面这样实例化这个类：

    # generate symbolic variables for input (x and y represent a

    # minibatch)

    x = T.matrix('x')  # data, presented as rasterized images

    y = T.ivector('y')  # labels, presented as 1D vector of [int] labels

    # construct the logistic regression class

    # Each MNIST image has size 28*28

    classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10)

我们先对训练输入分配符号变量和对应的类别标签分配符号变量。注意到x
和y 都是在LogisticRegression对象范围的外部定义的。因为该类需要输入来建立它的图（graph），所以需要将它作为参数传递给__init__函数。当你想要将这些类的实例连接起来形成一个深度网络的时候是很有用的。每一层的输出可以传递给上一层的输入（该教程不建立一个多层网络，不过该骂却可以在未来教程中重用）。最后，我们定义一个（符号）cost变量来最小化，使用实例方法classifier.negative_log_likehood：

    # the cost we minimize during training is the negative log likelihood of

    # the model in symbolic format

    cost = classifier.negative_log_likelihood(y)

注意到因为符号变量classifier是定义在以x项为初始化的基础上的，所以x
是一个对于cost的定义是隐式符号输入。

四、模型的学习

为了在大多数编程语言（c/c++、Matlab、Python）中执行MSGD，需要手动计算关于参数的损失函数梯度的表达式：在这种情况下 ,
和，对于复杂模型来说可能需要有相当的技巧，比如表达式可以变得相当复杂，特别是还要考虑数值稳定的问题的时候。不过通过theano，这个工作就变得相当简单了，它会自动进行分化（differentiation），然后使用某个数学变换来提升数值稳定性。为了在theano中得到和的梯度，简单的按照下面的操作：

    g_W = T.grad(cost=cost, wrt=classifier.W)

    g_b = T.grad(cost=cost, wrt=classifier.b)

g_w和g_b都是符号变量，用来作为计算图中的一部分。函数train_model是用来执行一步梯度的，可以被定义成如下形式：

    # specify how to update the parameters of the model as a list of

    # (variable, update expression) pairs.

    updates = [(classifier.W, classifier.W - learning_rate * g_W),

               (classifier.b, classifier.b - learning_rate * g_b)]

    # compiling a Theano function `train_model` that returns the cost, but in

    # the same time updates the parameter of the model based on the rules

    # defined in `updates`

    train_model = theano.function(

        inputs=[index],

        outputs=cost,

        updates=updates,

        givens={

            x: train_set_x[index * batch_size: (index + 1) * batch_size],

            y: train_set_y[index * batch_size: (index + 1) * batch_size]

        }

    )

updates是一个对列表。在每个对中，第一个元素是符号变量用来在每一步中进行更新，第二个元素是符号函数用来计算新的值。相似的，givens是一个字典，它的键是符号变量，它的值是在每一步中指定的替换值。函数train_model可以被如下规则定义：

输入是minibatch的索引 index 和
batch size (这不是输入，因为它是固定的) 用来定义和对应的标签
返回值是由索引index定义的x和y 的cost/loss函数值
在每次的函数调用上，首先会通过索引index切片的训练集来替换x 和y .然后，通过这个新的minibatch来估算对应的cost函数值，然后使用updates列表中定义的操作.

每次train_model(index)被调用的时候，它都会计算一个minibatch的cost然后返回，同时执行一步MSGD操作。整个学习算法会在数据集的所有样本上进行循环，考虑到一次只有一个minibatch中的所有样本，需要重复的调用train_model函数。

五、测试该模型

正如Learning
a Classifier中说的，当测试这个模型，感兴趣于其中有多少误分类的样本（不止是在似然函数中）。LogisticRegression类会有一个额外的实例方法，用来建立符号图，从而对每个minibatch中误分类样本进行检索。

代码如下:

   def errors(self, y):

        """Return a float representing the number of errors in the minibatch

        over the total number of examples of the minibatch ; zero one

        loss over the size of the minibatch

        :type y: theano.tensor.TensorType

        :param y: corresponds to a vector that gives for each example the

                  correct label

        """

        # check if y has same dimension of y_pred

        if y.ndim != self.y_pred.ndim:

            raise TypeError(

                'y should have the same shape as self.y_pred',

                ('y', y.type, 'y_pred', self.y_pred.type)

            )

        # check if y is of the correct datatype

        if y.dtype.startswith('int'):

            # the T.neq operator returns a vector of 0s and 1s, where 1

            # represents a mistake in prediction

            return T.mean(T.neq(self.y_pred, y))

        else:

            raise NotImplementedError()

然后我们创建一个函数test_model和一个函数validate_model。正如你将会看到的，validate_model是我们早期停止方法中的关键（ Early-Stopping）。这些函数会将minibatch
index作为输入然后计算由模型误分类的序列号。这两个函数之间唯一的区别在于test_model是从测试集中提取minibatches，而validate_model是从验证集中提取的：

   # compiling a Theano function that computes the mistakes that are made by

    # the model on a minibatch

    test_model = theano.function(

        inputs=[index],

        outputs=classifier.errors(y),

        givens={

            x: test_set_x[index * batch_size: (index + 1) * batch_size],

            y: test_set_y[index * batch_size: (index + 1) * batch_size]

        }

    )

    validate_model = theano.function(

        inputs=[index],

        outputs=classifier.errors(y),

        givens={

            x: valid_set_x[index * batch_size: (index + 1) * batch_size],

            y: valid_set_y[index * batch_size: (index + 1) * batch_size]

        }

    )

六、合并上面所有的

完成的成品代码如下：

"""

This tutorial introduces logistic regression using Theano and stochastic

gradient descent.

Logistic regression is a probabilistic, linear classifier. It is parametrized

by a weight matrix :math:`W` and a bias vector :math:`b`. Classification is

done by projecting data points onto a set of hyperplanes, the distance to

which is used to determine a class membership probability.

Mathematically, this can be written as:

.. math::

  P(Y=i|x, W,b) &= softmax_i(W x + b) \\

                &= \frac {e^{W_i x + b_i}} {\sum_j e^{W_j x + b_j}}

The output of the model or prediction is then done by taking the argmax of

the vector whose i'th element is P(Y=i|x).

.. math::

  y_{pred} = argmax_i P(Y=i|x,W,b)

This tutorial presents a stochastic gradient descent optimization method

suitable for large datasets.

References:

    - textbooks: "Pattern Recognition and Machine Learning" -

                 Christopher M. Bishop, section 4.3.2

"""

__docformat__ = 'restructedtext en'

import cPickle

import gzip

import os

import sys

import time

import numpy

import theano

import theano.tensor as T

class LogisticRegression(object):

    """Multi-class Logistic Regression Class

    The logistic regression is fully described by a weight matrix :math:`W`

    and bias vector :math:`b`. Classification is done by projecting data

    points onto a set of hyperplanes, the distance to which is used to

    determine a class membership probability.

    """

    def __init__(self, input, n_in, n_out):

        """ Initialize the parameters of the logistic regression

        :type input: theano.tensor.TensorType

        :param input: symbolic variable that describes the input of the

                      architecture (one minibatch)

        :type n_in: int

        :param n_in: number of input units, the dimension of the space in

                     which the datapoints lie

        :type n_out: int

        :param n_out: number of output units, the dimension of the space in

                      which the labels lie

        """

        # start-snippet-1

        # initialize with 0 the weights W as a matrix of shape (n_in, n_out)

        self.W = theano.shared(

            value=numpy.zeros(

                (n_in, n_out),

                dtype=theano.config.floatX

            ),

            name='W',

            borrow=True

        )

        # initialize the baises b as a vector of n_out 0s

        self.b = theano.shared(

            value=numpy.zeros(

                (n_out,),

                dtype=theano.config.floatX

            ),

            name='b',

            borrow=True

        )

        # symbolic expression for computing the matrix of class-membership

        # probabilities

        # Where:

        # W is a matrix where column-k represent the separation hyper plain for

        # class-k

        # x is a matrix where row-j  represents input training sample-j

        # b is a vector where element-k represent the free parameter of hyper

        # plain-k

        self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)

        # symbolic description of how to compute prediction as class whose

        # probability is maximal

        self.y_pred = T.argmax(self.p_y_given_x, axis=1)

        # end-snippet-1

        # parameters of the model

        self.params = [self.W, self.b]

    def negative_log_likelihood(self, y):

        """Return the mean of the negative log-likelihood of the prediction

        of this model under a given target distribution.

        .. math::

            \frac{1}{|\mathcal{D}|} \mathcal{L} (\theta=\{W,b\}, \mathcal{D}) =

            \frac{1}{|\mathcal{D}|} \sum_{i=0}^{|\mathcal{D}|}

                \log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\

            \ell (\theta=\{W,b\}, \mathcal{D})

        :type y: theano.tensor.TensorType

        :param y: corresponds to a vector that gives for each example the

                  correct label

        Note: we use the mean instead of the sum so that

              the learning rate is less dependent on the batch size

        """

        # start-snippet-2

        # y.shape[0] is (symbolically) the number of rows in y, i.e.,

        # number of examples (call it n) in the minibatch

        # T.arange(y.shape[0]) is a symbolic vector which will contain

        # [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of

        # Log-Probabilities (call it LP) with one row per example and

        # one column per class LP[T.arange(y.shape[0]),y] is a vector

        # v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,

        # LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is

        # the mean (across minibatch examples) of the elements in v,

        # i.e., the mean log-likelihood across the minibatch.

        return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])

        # end-snippet-2

    def errors(self, y):

        """Return a float representing the number of errors in the minibatch

        over the total number of examples of the minibatch ; zero one

        loss over the size of the minibatch

        :type y: theano.tensor.TensorType

        :param y: corresponds to a vector that gives for each example the

                  correct label

        """

        # check if y has same dimension of y_pred

        if y.ndim != self.y_pred.ndim:

            raise TypeError(

                'y should have the same shape as self.y_pred',

                ('y', y.type, 'y_pred', self.y_pred.type)

            )

        # check if y is of the correct datatype

        if y.dtype.startswith('int'):

            # the T.neq operator returns a vector of 0s and 1s, where 1

            # represents a mistake in prediction

            return T.mean(T.neq(self.y_pred, y))

        else:

            raise NotImplementedError()

def load_data(dataset):

    ''' Loads the dataset

    :type dataset: string

    :param dataset: the path to the dataset (here MNIST)

    '''

    #############

    # LOAD DATA #

    #############

    # Download the MNIST dataset if it is not present

    data_dir, data_file = os.path.split(dataset)

    if data_dir == "" and not os.path.isfile(dataset):

        # Check if dataset is in the data directory.

        new_path = os.path.join(

            os.path.split(__file__)[0],

            "..",

            "data",

            dataset

        )

        if os.path.isfile(new_path) or data_file == 'mnist.pkl.gz':

            dataset = new_path

    if (not os.path.isfile(dataset)) and data_file == 'mnist.pkl.gz':

        import urllib

        origin = (

            'http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz'

        )

        print 'Downloading data from %s' % origin

        urllib.urlretrieve(origin, dataset)

    print '... loading data'

    # Load the dataset

    f = gzip.open(dataset, 'rb')

    train_set, valid_set, test_set = cPickle.load(f)

    f.close()

    #train_set, valid_set, test_set format: tuple(input, target)

    #input is an numpy.ndarray of 2 dimensions (a matrix)

    #witch row's correspond to an example. target is a

    #numpy.ndarray of 1 dimensions (vector)) that have the same length as

    #the number of rows in the input. It should give the target

    #target to the example with the same index in the input.

    def shared_dataset(data_xy, borrow=True):

        """ Function that loads the dataset into shared variables

        The reason we store our dataset in shared variables is to allow

        Theano to copy it into the GPU memory (when code is run on GPU).

        Since copying data into the GPU is slow, copying a minibatch everytime

        is needed (the default behaviour if the data is not in a shared

        variable) would lead to a large decrease in performance.

        """

        data_x, data_y = data_xy

        shared_x = theano.shared(numpy.asarray(data_x,

                                               dtype=theano.config.floatX),

                                 borrow=borrow)

        shared_y = theano.shared(numpy.asarray(data_y,

                                               dtype=theano.config.floatX),

                                 borrow=borrow)

        # When storing data on the GPU it has to be stored as floats

        # therefore we will store the labels as ``floatX`` as well

        # (``shared_y`` does exactly that). But during our computations

        # we need them as ints (we use labels as index, and if they are

        # floats it doesn't make sense) therefore instead of returning

        # ``shared_y`` we will have to cast it to int. This little hack

        # lets ous get around this issue

        return shared_x, T.cast(shared_y, 'int32')

    test_set_x, test_set_y = shared_dataset(test_set)

    valid_set_x, valid_set_y = shared_dataset(valid_set)

    train_set_x, train_set_y = shared_dataset(train_set)

    rval = [(train_set_x, train_set_y), (valid_set_x, valid_set_y),

            (test_set_x, test_set_y)]

    return rval

def sgd_optimization_mnist(learning_rate=0.13, n_epochs=1000,

                           dataset='mnist.pkl.gz',

                           batch_size=600):

    """

    Demonstrate stochastic gradient descent optimization of a log-linear

    model

    This is demonstrated on MNIST.

    :type learning_rate: float

    :param learning_rate: learning rate used (factor for the stochastic

                          gradient)

    :type n_epochs: int

    :param n_epochs: maximal number of epochs to run the optimizer

    :type dataset: string

    :param dataset: the path of the MNIST dataset file from

                 http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz

    """

    datasets = load_data(dataset)

    train_set_x, train_set_y = datasets[0]

    valid_set_x, valid_set_y = datasets[1]

    test_set_x, test_set_y = datasets[2]

    # compute number of minibatches for training, validation and testing

    n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size

    n_valid_batches = valid_set_x.get_value(borrow=True).shape[0] / batch_size

    n_test_batches = test_set_x.get_value(borrow=True).shape[0] / batch_size

    ######################

    # BUILD ACTUAL MODEL #

    ######################

    print '... building the model'

    # allocate symbolic variables for the data

    index = T.lscalar()  # index to a [mini]batch

    # generate symbolic variables for input (x and y represent a

    # minibatch)

    x = T.matrix('x')  # data, presented as rasterized images

    y = T.ivector('y')  # labels, presented as 1D vector of [int] labels

    # construct the logistic regression class

    # Each MNIST image has size 28*28

    classifier = LogisticRegression(input=x, n_in=28 * 28, n_out=10)

    # the cost we minimize during training is the negative log likelihood of

    # the model in symbolic format

    cost = classifier.negative_log_likelihood(y)

    # compiling a Theano function that computes the mistakes that are made by

    # the model on a minibatch

    test_model = theano.function(

        inputs=[index],

        outputs=classifier.errors(y),

        givens={

            x: test_set_x[index * batch_size: (index + 1) * batch_size],

            y: test_set_y[index * batch_size: (index + 1) * batch_size]

        }

    )

    validate_model = theano.function(

        inputs=[index],

        outputs=classifier.errors(y),

        givens={

            x: valid_set_x[index * batch_size: (index + 1) * batch_size],

            y: valid_set_y[index * batch_size: (index + 1) * batch_size]

        }

    )

    # compute the gradient of cost with respect to theta = (W,b)

    g_W = T.grad(cost=cost, wrt=classifier.W)

    g_b = T.grad(cost=cost, wrt=classifier.b)

    # start-snippet-3

    # specify how to update the parameters of the model as a list of

    # (variable, update expression) pairs.

    updates = [(classifier.W, classifier.W - learning_rate * g_W),

               (classifier.b, classifier.b - learning_rate * g_b)]

    # compiling a Theano function `train_model` that returns the cost, but in

    # the same time updates the parameter of the model based on the rules

    # defined in `updates`

    train_model = theano.function(

        inputs=[index],

        outputs=cost,

        updates=updates,

        givens={

            x: train_set_x[index * batch_size: (index + 1) * batch_size],

            y: train_set_y[index * batch_size: (index + 1) * batch_size]

        }

    )

    # end-snippet-3

    ###############

    # TRAIN MODEL #

    ###############

    print '... training the model'

    # early-stopping parameters

    patience = 5000  # look as this many examples regardless

    patience_increase = 2  # wait this much longer when a new best is

                                  # found

    improvement_threshold = 0.995  # a relative improvement of this much is

                                  # considered significant

    validation_frequency = min(n_train_batches, patience / 2)

                                  # go through this many

                                  # minibatche before checking the network

                                  # on the validation set; in this case we

                                  # check every epoch

    best_validation_loss = numpy.inf

    test_score = 0.

    start_time = time.clock()

    done_looping = False

    epoch = 0

    while (epoch < n_epochs) and (not done_looping):

        epoch = epoch + 1

        for minibatch_index in xrange(n_train_batches):

            minibatch_avg_cost = train_model(minibatch_index)

            # iteration number

            iter = (epoch - 1) * n_train_batches + minibatch_index

            if (iter + 1) % validation_frequency == 0:

                # compute zero-one loss on validation set

                validation_losses = [validate_model(i)

                                     for i in xrange(n_valid_batches)]

                this_validation_loss = numpy.mean(validation_losses)

                print(

                    'epoch %i, minibatch %i/%i, validation error %f %%' %

                    (

                        epoch,

                        minibatch_index + 1,

                        n_train_batches,

                        this_validation_loss * 100.

                    )

                )

                # if we got the best validation score until now

                if this_validation_loss < best_validation_loss:

                    #improve patience if loss improvement is good enough

                    if this_validation_loss < best_validation_loss *  \

                       improvement_threshold:

                        patience = max(patience, iter * patience_increase)

                    best_validation_loss = this_validation_loss

                    # test it on the test set

                    test_losses = [test_model(i)

                                   for i in xrange(n_test_batches)]

                    test_score = numpy.mean(test_losses)

                    print(

                        (

                            '     epoch %i, minibatch %i/%i, test error of'

                            ' best model %f %%'

                        ) %

                        (

                            epoch,

                            minibatch_index + 1,

                            n_train_batches,

                            test_score * 100.

                        )

                    )

            if patience <= iter:

                done_looping = True

                break

    end_time = time.clock()

    print(

        (

            'Optimization complete with best validation score of %f %%,'

            'with test performance %f %%'

        )

        % (best_validation_loss * 100., test_score * 100.)

    )

    print 'The code run for %d epochs, with %f epochs/sec' % (

        epoch, 1. * epoch / (end_time - start_time))

    print >> sys.stderr, ('The code for file ' +

                          os.path.split(__file__)[1] +

                          ' ran for %.1fs' % ((end_time - start_time)))

if __name__ == '__main__':

    sgd_optimization_mnist()

用户可以学着使用SGD逻辑回归来分类MNIST数字，从dl教程文件夹中找到如下的代码执行：

python code/logistic_sgd.py

输出会如下：

...

epoch 72, minibatch 83/83, validation error 7.510417 %

     epoch 72, minibatch 83/83, test error of best model 7.510417 %

epoch 73, minibatch 83/83, validation error 7.500000 %

     epoch 73, minibatch 83/83, test error of best model 7.489583 %

Optimization complete with best validation score of 7.500000 %,with test performance 7.489583 %

The code run for 74 epochs, with 1.936983 epochs/sec

在intel（R）Core(TM)2 Duo CPU E8400 @ 3.00Ghz下，代码差不多速度为1.936 epochs/sec，在执行了75个epochs之后，达到的测试错误率为7.489%。在GPU上执行差不多是10.0epochs/sec。这个例子中我们的batch size为600.

脚注：

[1] 对于更小的数据集和更简单的模型，更多老练的下降方法将会更有效。例子代码logistic_cg.py显示如何使用Scipy的共轭梯度
来在Theano上解决逻辑回归的问题。

下面是在win7_64bit+cuda6.5+anaconda_2.1.0+theano_0.7.0下跑的logistic_cg.py的结果：

参考资料：

[1] 官网：http://deeplearning.net/tutorial/logreg.html#logreg

[2] Classifying MNIST digits using Logistic Regression ：http://blog.sina.com.cn/s/blog_6caa9fa10101m33n.html

[3] DeepLearning tutorial（1）Softmax回归原理简介+代码详解：http://blog.csdn.net/u012162613/article/details/43157801

[4]

Theano3.3-练习之逻辑回归的更多相关文章

逻辑回归 Logistic Regression
逻辑回归(Logistic Regression)是广义线性回归的一种.逻辑回归是用来做分类任务的常用算法.分类任务的目标是找一个函数,把观测值匹配到相关的类和标签上.比如一个人有没有病,又因为噪声的 ...
用R做逻辑回归之汽车贷款违约模型
数据说明本数据是一份汽车贷款违约数据 application_id 申请者ID account_number 账户号 bad_ind 是否违约 vehicle_year ...
逻辑回归（LR）总结复习
摘要: 1.算法概述 2.算法推导 3.算法特性及优缺点 4.注意事项 5.实现和具体例子 6.适用场合内容: 1.算法概述最基本的LR分类器适合于对两分类(类0,类1)目标进行分类:这个模型以样 ...
scikit-learn 逻辑回归类库使用小结
之前在逻辑回归原理小结这篇文章中,对逻辑回归的原理做了小结.这里接着对scikit-learn中逻辑回归类库的我的使用经验做一个总结.重点讲述调参中要注意的事项. 1. 概述在scikit-lear ...
逻辑回归LR
逻辑回归算法相信很多人都很熟悉,也算是我比较熟悉的算法之一了,毕业论文当时的项目就是用的这个算法.这个算法可能不想随机森林.SVM.神经网络.GBDT等分类算法那么复杂那么高深的样子,可是绝对不能小看 ...
逻辑回归（Logistic Regression）
转载请注明出自BYRans博客:http://www.cnblogs.com/BYRans/ 本文主要讲解分类问题中的逻辑回归.逻辑回归是一个二分类问题. 二分类问题二分类问题是指预测的y值只有两个 ...
逻辑回归算法的原理及实现(LR)
Logistic回归虽然名字叫"回归" ,但却是一种分类学习方法.使用场景大概有两个:第一用来预测,第二寻找因变量的影响因素.逻辑回归(Logistic Regression, L ...
感知器、逻辑回归和SVM的求解
这篇文章将介绍感知器.逻辑回归的求解和SVM的部分求解,包含部分的证明.本文章涉及的一些基础知识,已经在<梯度下降.牛顿法和拉格朗日对偶性>中指出,而这里要解决的问题,来自<从感知器 ...
stanford coursera 机器学习编程作业 exercise 3（逻辑回归实现多分类问题）
本作业使用逻辑回归(logistic regression)和神经网络(neural networks)识别手写的阿拉伯数字(0-9) 关于逻辑回归的一个编程练习,可参考:http://www.cnb ...

随机推荐

Java虚拟机内存管理原理基础入门
Jdk:Java程序设计语言.Java虚拟机.Java API类库. Jdk是用于支持Java程序开发的最小环境. Jre:Java API类库中的Java SE API子集.Java虚拟机. Jre ...
《java JDK7 学习笔记》课后练习题3
1.如果有以下的程序代码:int number;System.out.println(number);以下描述何者正确?A.执行时显示0B.执行时显示随机数字C.执行时出现错误D.编译失败 2.如果有 ...
说完Pivot 今天说下Unpivot 的处理方式
上次说到,既然有Pivot 的行转列,那么肯定也有Unpivot 的列转行 .其实unpivot 处理的情况也是差不多,也是分3步走. 首先也是先演示一下unpivot 的用法 ),Mon TIME, ...
linux下mysql开启远程访问权限及防火墙开放3306端口
默认mysql的用户是没有远程访问的权限的,因此当程序跟数据库不在同一台服务器上时,我们需要开启mysql的远程访问权限. 主流的有两种方法,改表法和授权法. 相对而言,改表法比较容易一点,个人也是比 ...
python抓取网页中图片并保存到本地
#-*-coding:utf-8-*- import os import uuid import urllib2 import cookielib '''获取文件后缀名''' def get_file ...
java设计模式之装饰模式
发现设计模式的学习越来越让自己学习的东西太少了,应该多接触一些东西,多出去走一走. 装饰模式概念: 动态地给一个对象添加一些额外的职责,就增加功能来说,装饰模式比生成子类更为灵活(大话设计模式) 在不 ...
第9章用内核对象进行线程同步（3）_信号量(semaphore）、互斥对象(mutex)
9.5 信号量内核对象(Semaphore) (1)信号量的组成 ①计数器:该内核对象被使用的次数 ②最大资源数量:标识信号量可以控制的最大资源数量(带符号的32位) ③当前资源数量:标识当前可用资源 ...
第16章调色板管理器_16.4 一个DIB位图库的实现（1）
16.4.1自定义的 DIBSTRUCT结构体字段含义 PBYTE *ppRow ①指向位图视觉上最上面的一行像素.(不管是自下而上,还是自上而下) ②放在第一个字段,为的是后面定义宏时可方便访问 ...
java 24 - 8 GUI之创建四则运算计算器（未校验版）
这个是用NetBeans软件制作的,因为这个软件制作GUI任务比较方便通过拖拽控件生成的窗体:(红色的名称是更改后的控件名称) 拉拽好布局后,要进行的步骤: A:更改想要进行操作的控件的名称(右键控 ...
javascript的几个小技巧
1.在循环中缓存array.length 这个技巧很简单,这个在处理一个很大的数组循环时,对性能影响将是非常大的.基本上,大家都会写一个这样的同步迭代的数组. for(var i=0;i<arr ...

Theano3.3-练习之逻辑回归

Theano3.3-练习之逻辑回归的更多相关文章

随机推荐

热门专题