BP算法在minist上的简单实现

数据：http://yann.lecun.com/exdb/mnist/

参考：blog,blog2,blog3,tensorflow

推导：http://www.cnblogs.com/yueshangzuo/p/8025157.html

基本实现

import struct

import random

import numpy as np

from math import sqrt

class Data:

    def __init__(self):

        print 'parameter initializing...'

        self.num_train= 50000

        self.num_confirm=10000

        self.num_test= 10000

        self.node_in=28*28

        self.node_out=10

        # need to adjust

        #epoch:8 hide_node:39 accuracy:0.9613

        #epoch:8 hide_node:44 accuracy:0.9612

        #epoch:8 hide_node:48 accuracy:0.9624

        #epoch:9 hide_node:48 accuracy:0.9648

        #epoch:10 hide_node:200 accuracy:0.9724

        self.epoch= 15

        self.node_hide= 30

        self.study_rate= 0.05

        self.error_limit= 1e-2

    def read_train_image(self,filename):

        print 'reading train-image data...'

        binfile=open(filename,'rb')

        buffer=binfile.read()

        index=0

        magic,num,rows,colums = struct.unpack_from('>IIII',buffer,index)  #>I:big-endian,unsigned int

        index+=struct.calcsize('IIII')

        for i in range(self.num_train):

            im=struct.unpack_from('784B',buffer,index)  #28*28=786,B unsigned char

            index+=struct.calcsize('784B')

            im=np.array(im)

            im=im.reshape(1,784)/255.0   #28*28-->1

            self.train_imag_list[i,:]=im

        j=0

        for i in range(self.num_train,self.num_train+self.num_confirm):

            im=struct.unpack_from('784B',buffer,index)

            index+=struct.calcsize('784B')

            im=np.array(im)

            im=im.reshape(1,784)/255.0

            self.confirm_imag_list[j,:]=im

            j=j+1

    def read_train_label(self,filename):

        print 'reading train-label data...'

        binfile=open(filename,'rb')

        buffer=binfile.read()

        index=0

        magic,num= struct.unpack_from('>II',buffer,index)

        index+=struct.calcsize('II')

        for i in range(self.num_train):

            lb=struct.unpack_from('B',buffer,index)

            index+=struct.calcsize('B')

            lb=int(lb[0])

            self.train_label_list[i,:]=lb

        j=0

        for i in range(self.num_train,self.num_train+self.num_confirm):

            lb=struct.unpack_from('B',buffer,index)

            index+=struct.calcsize('B')

            lb=int(lb[0])

            self.confirm_label_list[j,:]=lb

            j=j+1

    def read_test_image(self,filename):

        print 'reading test-image data...'

        binfile=open(filename,'rb')

        buffer=binfile.read()

        index=0

        magic,num,rows,colums = struct.unpack_from('>IIII',buffer,index)

        index+=struct.calcsize('IIII')

        for i in range(self.num_test):

            im=struct.unpack_from('784B',buffer,index)

            index+=struct.calcsize('784B')

            im=np.array(im)

            im=im.reshape(1,784)/256.0

            self.test_imag_list[i,:]=im

    def read_test_label(self,filename):

        print 'reading test-label data...'

        binfile=open(filename,'rb')

        buffer=binfile.read()

        index=0

        magic,num= struct.unpack_from('>II',buffer,index)

        index+=struct.calcsize('II')

        for i in range(self.num_test):

            lb=struct.unpack_from('B',buffer,index)

            index+=struct.calcsize('B')

            lb=int(lb[0])

            self.test_label_list[i,:]=lb

    def init_network(self):

        print 'network initializing...'

        self.train_imag_list=np.zeros((self.num_train,self.node_in))

        self.train_label_list=np.zeros((self.num_train,1))

        self.confirm_imag_list=np.zeros((self.num_confirm,self.node_in))

        self.confirm_label_list=np.zeros((self.num_confirm,1))

        self.test_imag_list=np.zeros((self.num_test,self.node_in))

        self.test_label_list=np.zeros((self.num_test,1))

        self.read_train_image('train-images.idx3-ubyte')

        self.read_train_label('train-labels.idx1-ubyte')

        self.read_test_image('t10k-images.idx3-ubyte')

        self.read_test_label('t10k-labels.idx1-ubyte')

        self.wjk=(np.random.rand(self.node_hide,self.node_out)-0.5)*2/sqrt(self.node_hide)

        self.wj0=(np.random.rand(self.node_out)-0.5)*2/sqrt(self.node_hide)

        self.wij=(np.random.rand(self.node_in,self.node_hide)-0.5)*2/sqrt(self.node_in)

        self.wi0=(np.random.rand(self.node_hide)-0.5)*2/sqrt(self.node_in)

    def sigmode(self,x):

            return 1.0/(1.0+np.exp(-x))

    def calc_yjzk(self,sample_i,imag_list):

        self.netj=np.dot(imag_list[sample_i],self.wij)+self.wi0

        self.yj=self.sigmode(self.netj)

        self.netk=np.dot(self.yj,self.wjk)+self.wj0

        self.zk=self.sigmode(self.netk)

    def calc_error(self):

        ans=0.0

        for sample_i in range(self.num_confirm):

            self.calc_yjzk(sample_i,self.confirm_imag_list)

            label_tmp=np.zeros(self.node_out)

            label_tmp[int(self.confirm_label_list[sample_i])]=1

            ans=ans+sum(np.square(label_tmp-self.zk)/2.0)

        # print ans

        return ans

    def training(self):

        print 'training model...'

        for epoch_i in range(self.epoch):

            for circle in range(self.num_train):

                sample_i=np.random.randint(0,self.num_train)

                #print 'debug epoch:%d sample:%d' % (epoch_i,sample_i)

                #calc  error

                #error_before=self.calc_error()

                self.calc_yjzk(sample_i,self.train_imag_list)

                #update weight hide->out

                tmp_label=np.zeros(self.node_out)

                tmp_label[int(self.train_label_list[sample_i])]=1

                delta_k=(self.zk-tmp_label)*self.zk*(1-self.zk)

                self.yj.shape=(self.node_hide,1)

                delta_k.shape=(1,self.node_out)

                self.wjk=self.wjk-self.study_rate*np.dot(self.yj,delta_k)

                #update weight in->hide

                self.yj=self.yj.T

                delta_j=np.dot(delta_k,self.wjk.T)*self.yj*(1-self.yj)

                tmp_imag=self.train_imag_list[sample_i]

                tmp_imag.shape=(self.node_in,1)

                self.wij=self.wij-self.study_rate*np.dot(tmp_imag,delta_j)

                # calc error

                # self.calc_yjzk(sample_i,self.train_imag_list)

                # error_delta=error_before-self.calc_error()

                # if np.abs(error_delta)<self.error_limit:

                #     print 'debug break'

                #     print error_delta

                #     break

            #print 'error %d %.2f' % (epoch_i,self.calc_error())

    def testing(self):

        print 'testing...'

        num_right=0.0

        for sample_i in range(self.num_test):

            self.calc_yjzk(sample_i,self.test_imag_list)

            ans=self.zk.argmax()

            if ans==int(self.test_label_list[sample_i]):

                num_right=num_right+1

        self.accuracy=num_right/self.num_test

        print 'accuracy: %.4f' % (self.accuracy*100) +'%'

def main():

    data=Data()

    data.init_network()

    data.training()

    data.testing()

if __name__=='__main__':

    main()

注意

注意数据的编码格式，在数据来源网站最底下有指出，上面还展示了一些机器学习的经典模型在minist数据集上的错误率可供参考
权值合理的初始化，及迭代次数，学习速率，隐层节点数的设置可参考经验值
数据的归一化(防止sigmode函数溢出)
矩阵乘法时注意行列条件的满足
合理的epoch(即迭代次数，学习速率小的时候可以大一点的迭代次数，学习速率大的时候迭代次数取较小值)
确认合适的迭代次数后可去掉确认集，用全部的样本数据训练模型
隐层节点基本上越多越好

调参脚本

import ann

f=open('best_parameter', 'a+')

for e in range(10,40):

    for node in range(10,50):

        data=ann.Data()

        data.node_hide=node

        data.epoch=e

        data.init_network()

        data.training()

        data.testing()

        ans='circling to get best parameter----->epoch:%d hide_node:%d accuracy:%.4f\n' % (e,node,data.accuracy)

        print ans

        f.write(ans)

f.close()

可迭代计算迭代次数和隐层节点的数目对准确率的影响，大致规律是在学习速率0.05时，迭代次数在10-15为宜，隐层节点30以上

一些试验的结果如下：

circling to get best parameter----->epoch:14 hide_node:43 accuracy:0.9656

circling to get best parameter----->epoch:14 hide_node:44 accuracy:0.9651

circling to get best parameter----->epoch:14 hide_node:45 accuracy:0.9638

circling to get best parameter----->epoch:14 hide_node:46 accuracy:0.9641

circling to get best parameter----->epoch:14 hide_node:47 accuracy:0.9649

circling to get best parameter----->epoch:14 hide_node:48 accuracy:0.9651

circling to get best parameter----->epoch:14 hide_node:49 accuracy:0.9671

circling to get best parameter----->epoch:15 hide_node:46 accuracy:0.9661

circling to get best parameter----->epoch:15 hide_node:47 accuracy:0.9660

circling to get best parameter----->epoch:15 hide_node:48 accuracy:0.9650

circling to get best parameter----->epoch:15 hide_node:49 accuracy:0.9655

circling to get best parameter----->epoch:10 hide_node:100 accuracy:0.9685

circling to get best parameter----->epoch:10 hide_node:200 accuracy:0.9724

circling to get best parameter----->epoch:10 hide_node:300 accuracy:0.9718

circling to get best parameter----->epoch:10 hide_node:1000 accuracy:0.9568

Tensorflow实现

import argparse

# Import data

from tensorflow.examples.tutorials.mnist import input_data

import tensorflow as tf

FLAGS = None

def weight_variable(shape):

    initial = tf.truncated_normal(shape, stddev=0.1)

    return tf.Variable(initial)

def bias_variable(shape):

    initial = tf.constant(0.1, shape=shape)

    return tf.Variable(initial)

def conv2d(x, W):

    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):

    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],

                          strides=[1, 2, 2, 1], padding='SAME')

def add_layer(inputs, in_size, out_size, activation_function=None):

    # add a fully collected layer

    Weights = weight_variable([in_size, out_size])

    biases = bias_variable([out_size])

    Wx_plus_b = tf.matmul(inputs, Weights) + biases

    if activation_function is None:

        outputs = Wx_plus_b

    else:

        outputs = activation_function(Wx_plus_b)

    return outputs

def main(_):

    mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)

    # reshape the input to have batch size, width, height, channel size

    x = tf.placeholder(tf.float32, [None, 784])

    x_image = tf.reshape(x, [-1, 28, 28, 1])

    # 5*5 patch size, input channel is 1, output channel is 32

    W_conv1 = weight_variable([5, 5, 1, 32])

    # bias, same size with the output channel

    b_conv1 = bias_variable([32])

    # the first convolutional layer with a max pooling layer

    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)

    h_pool1 = max_pool_2x2(h_conv1)

    #after pooling, we have a tensor with shape[-1, 14, 14, 32]

    # the weights and bias for the second layer, we will get 64 channels

    W_conv2 = weight_variable([5, 5, 32, 64])

    b_conv2 = bias_variable([64])

    # the second convolutional layer with a max pooling layer

    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)

    h_pool2 = max_pool_2x2(h_conv2)

    # after pooling, we have a tensor with shape[-1, 7, 7, 64]

    # add a fully connected layer with 1024 neurons and use relu as the activation function

    h_pool2_flat = tf.reshape(h_pool2, [-1,7*7*64])

    h_fc1 = add_layer(h_pool2_flat, 7*7*64, 1024, tf.nn.relu)

    # we add dropout for the fully connected layer to avoid overfitting

    keep_prob = tf.placeholder(tf.float32)

    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

    # finally, the output layer

    y_conv = add_layer(h_fc1_drop, 1024, 10, None)

    # loss function and so on

    y_ = tf.placeholder(tf.float32, [None, 10])

    cross_entropy = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(logits=y_conv, labels=y_))

    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)

    correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))

    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    # start training, and we test our model every 100 steps

    sess = tf.InteractiveSession()

    sess.run(tf.initialize_all_variables())

    for i in range(10000):

        batch = mnist.train.next_batch(100)

        if i % 100 == 0:

            train_accuracy = accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0})

            test_accuracy = accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels, keep_prob: 1.0})

            print("step %d, training accuracy %g, test accuracy %g" % (i, train_accuracy, test_accuracy))

        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})

if __name__ == '__main__':

    parser = argparse.ArgumentParser()

    # modify the dir path to your own dataset

    parser.add_argument('--data_dir', type=str, default='/tmp/mnist',

                        help='Directory for storing data')

    FLAGS = parser.parse_args()

    tf.app.run()

需要配置tensorflow和python3.+的运行环境

结果如下

step 0, training accuracy 0.06, test accuracy 0.0892

step 100, training accuracy 0.86, test accuracy 0.8692

step 200, training accuracy 0.97, test accuracy 0.9207

step 300, training accuracy 0.92, test accuracy 0.9403

step 400, training accuracy 0.95, test accuracy 0.9485

step 500, training accuracy 0.91, test accuracy 0.9522

step 600, training accuracy 0.97, test accuracy 0.9565

step 700, training accuracy 0.97, test accuracy 0.9622

step 800, training accuracy 0.96, test accuracy 0.9638

step 900, training accuracy 0.98, test accuracy 0.9687

step 1000, training accuracy 0.97, test accuracy 0.9703

有任何环境配置的问题请联系，欢迎指出错误

BP算法在minist数据集上的简单实现的更多相关文章

(2) 用DPM(Deformable Part Model，voc-release4.01)算法在INRIA数据集上训练自己的人体检測模型
步骤一,首先要使voc-release4.01目标检測部分的代码在windows系统下跑起来: 參考在window下执行DPM(deformable part models) -(检測demo部分) ...
如何高效的通过BP算法来训练CNN
< Neural Networks Tricks of the Trade.2nd>这本书是收录了1998-2012年在NN上面的一些技巧.原理.算法性文章,对于初学者或者是正在学习NN的 ...
一文彻底搞懂BP算法：原理推导+数据演示+项目实战（上篇）
欢迎大家关注我们的网站和系列教程:http://www.tensorflownews.com/,学习更多的机器学习.深度学习的知识! 反向传播算法(Backpropagation Algorithm, ...
Backpropagation反向传播算法（BP算法）
1.Summary: Apply the chain rule to compute the gradient of the loss function with respect to the inp ...
在Titanic数据集上应用AdaBoost元算法
一．AdaBoost 元算法的基本原理 AdaBoost是adaptive boosting的缩写,就是自适应boosting.元算法是对于其他算法进行组合的一种方式. 而boosting是在从原始数 ...
TersorflowTutorial_MNIST数据集上简单CNN实现
MNIST数据集上简单CNN实现觉得有用的话,欢迎一起讨论相互学习~Follow Me 参考文献 Tensorflow机器学习实战指南源代码请点击下方链接欢迎加星 Tesorflow实现基于MNI ...
MNIST数据集上卷积神经网络的简单实现（使用PyTorch)
设计的CNN模型包括一个输入层,输入的是MNIST数据集中28*28*1的灰度图两个卷积层, 第一层卷积层使用6个3*3的kernel进行filter,步长为1,填充1.这样得到的尺寸是(28+1* ...
DNN的BP算法Python简单实现
BP算法是神经网络的基础,也是最重要的部分.由于误差反向传播的过程中,可能会出现梯度消失或者爆炸,所以需要调整损失函数.在LSTM中,通过sigmoid来实现三个门来解决记忆问题,用tensorflo ...
史上最简单的排序算法？看起来却满是bug
大家好,我是雨乐. 今天在搜论文的时候,偶然发现一篇文章,名为<Is this the simplest (and most surprising) sorting algorithm ever ...

随机推荐

《转》 EJB到底是什么，真的那么神秘吗？？
1. 我们不禁要问,什么是"服务集群"?什么是"企业级开发"? 既然说了EJB 是为了"服务集群"和"企业级开发",那么 ...
L142
keep half an eye on something分神留意splash out随意花钱大肆挥霍half a mind有想做某事go Dutch v. 各自付帐,打平伙chance in a ...
jQuery的width()、innerWidth()、outerWidth()方法
width(): 不包括元素的外边距(margin).内边距(padding).边框(border)等部分的宽度. innerWidth(): 包括元素的内边距(padding),但不包括外边距(ma ...
Tinker爬坑之路
目的热修复去年年底出的时候,变成了今年最火的技术之一.依旧记得去年面试的时候统一的MVP,然而今年却变成了RN,热修复.这不得不导致我们需要随时掌握最新的技术.不然可能随时会被淘汰.记得刚进公司,技 ...
Windows10使用Chocolatey安装mysql之后无法使用的解决办法
问题背景:使用了一台新的虚拟机,并且安装了Chocolatey作为Windows的包管理器,之后安装mysql 那么问题发生了,使用mysql命令根本没有任何反应,也不报错,但是安装的时候是提示安装成 ...
ngxtop实时监控nginx状态
ngxtop实时解析nginx访问日志,并且将处理结果输出到终端,功能类似于系统命令top,所以这个软件起名ngxtop.有了ngxtop,你可以实时了解到当前nginx的访问状况,再也不需要tail ...
JavaScript test//href
目录 JavaScript test//href JavaScript test//href href 其实这个问题并不属于这里的.但是呢,由于一天晚上因为这个问题扰我"一夜不能眠" ...
（二）canvas边框问题
lineWidth 设置边框的大小 fillStyle 设置div的颜色 strokeStyle 设置边框的颜色注: 边框在不设置的情况下默认为1px 黑色,但是x,y轴的距离是以图形的正中心为原始 ...
获取mac地址方法之一 GetAdaptersInfo()
GetAdaptersInfo -20151116 防止返回的mac出现null 20151116 From:http://blog.csdn.net/weiyumingwww/article/det ...
MS SQL Server2000转换成MySQL
按计划今天着手进行将后台数据库从MS SQL Server2000转换成MySQL5.1.3.目的是便于发布软件的测试版本. 1. 驱动: mysql-connector-odbc-5.1.11-wi ...

BP算法在minist数据集上的简单实现

BP算法在minist上的简单实现

基本实现

调参脚本

Tensorflow实现

BP算法在minist数据集上的简单实现的更多相关文章

随机推荐

热门专题