TensorFlow下利用MNIST训练模型并识别自己手写的数字

最近一直在学习李宏毅老师的机器学习视频教程，学到和神经网络那一块知识的时候，我觉得单纯的学习理论知识过于枯燥，就想着自己动手实现一些简单的Demo,毕竟实践是检验真理的唯一标准！！！但是网上很多的与tensorflow或者神经网络相关的Demo教程都只是在验证官方程序的过程，而如何把这些程序变成自己可以真正利用的程序这一块的资料就比较少，就好比被“玩烂的"MNIST数据集(ML界的”hello world")，网上是有很多手写数字识别的教程，但那些利用的都是官方提供的数据集，这样就算验证成功了带来的满足感还是远远不够！废话不多说，接下来就让我来介绍一下如何使用Tensorflow和MNIST识别自己写的数字(比如下图这个我写的数字5~~)

本文也参考了某些大神博客的内容。希望能帮助和我一样刚刚起步的同学，大家多多指教。

相应的代码和官方以及自己的数据集：https://github.com/tgpcai/digit_recognition

（1）MNIST数据集简介

（2）利用MNIST数据集训练模型

（3）自己手写数字，并用matlab进行预处理

（4）将图片输入网络进行识别

（5）实践过程遇到的坑与总结

(1)MNIST数据集简介

既然我们要构建自己的数据集，那我们就必须要了解官方提供的数据集的格式，大小等一些特征。MNIST是一个巨大的手写数字数据集，被广泛应用于机器学习识别领域。MNIST有60000张训练集数据和10000张测试集数据，每一个训练元素都是28*28像素的手写数字图片，而且都是黑白色构成（这里的黑色是一个0-1的浮点数，黑色越深表示数值越靠近1）。在网上搜索一下MNIST，你可以发现图片长这样：

上图就是4张MNIST图片。这些图片并不是传统意义上的png或者jpg格式的图片，因为png或者jpg的图片格式，会带有很多干扰信息，所以我们在创建自己的数据集的时候就必须进行预处理。

划重点：28*28像素，灰度图

(2)利用MNIST数据集训练模型，并保存模型

该Demo使用的模型主要是CNN卷积神经网络，该模型广泛应用于图片识别、自然语言处理等方向。有关CNN卷积神经网络的知识在我的其他博客中有详细介绍，欢迎大家一起交流！上代码：

 import tensorflow as tf

 from tensorflow.examples.tutorials.mnist import input_data

 #定义初始化权重的函数

 def weight_variavles(shape):

     w = tf.Variable(tf.truncated_normal(shape, stddev=0.1))

     return w

 #定义一个初始化偏置的函数

 def bias_variavles(shape):

     b = tf.Variable(tf.constant(0.1, shape=shape))

     return b 

 def model():

     #1.建立数据的占位符 x [None, 784]  y_true [None, 10]

     with tf.variable_scope("date"):

         x = tf.placeholder(tf.float32, [None, 784])

         y_true = tf.placeholder(tf.float32, [None, 10])

     #2.卷积层1  卷积:5*5*1,32个filter,strides= 1-激活-池化

     with tf.variable_scope("conv1"):

         #随机初始化权重

         w_conv1 = weight_variavles([5, 5, 1, 32])

         b_conv1 = bias_variavles([32])

         #对x进行形状的改变[None, 784] ----- [None,28,28,1]

         x_reshape = tf.reshape(x,[-1, 28, 28, 1])  #不能填None,不知道就填-1

         # [None,28, 28, 1] -------- [None, 28, 28, 32]

         x_relu1 = tf.nn.relu(tf.nn.conv2d(x_reshape, w_conv1, strides=[1, 1, 1, 1], padding = "SAME") + b_conv1)

         #池化 2*2，步长为2，【None, 28,28, 32]--------[None,14, 14, 32]

         x_pool1 = tf.nn.max_pool(x_relu1, ksize=[1, 2, 2, 1],strides = [1,2,2,1],padding = "SAME")

     #3.卷积层2  卷积:5*5*32,64个filter,strides= 1-激活-池化

     with tf.variable_scope("conv2"):

         #随机初始化权重和偏置

         w_conv2 = weight_variavles([5, 5, 32, 64])

         b_conv2 = bias_variavles([64])

         #卷积、激活、池化

         #[None,14, 14, 32]----------【NOne, 14, 14, 64]

         x_relu2 = tf.nn.relu(tf.nn.conv2d(x_pool1, w_conv2,strides=[1, 1, 1, 1], padding = "SAME") + b_conv2)

         #池化 2*2，步长为2 【None, 14,14，64]--------[None,7, 7, 64]

         x_pool2 = tf.nn.max_pool(x_relu2, ksize=[1, 2, 2, 1],strides = [1,2,2,1],padding = "SAME")

     #4.全连接层 [None,7, 7, 64] --------- [None, 7*7*64] * [7*7*64, 10]+[10] = [none, 10]

     with tf.variable_scope("fc"):

         #随机初始化权重和偏置:

         w_fc = weight_variavles([7 * 7 * 64, 1024])

         b_fc = bias_variavles([1024])

         #修改形状 [none, 7, 7, 64] ----------[None, 7*7*64]

         x_fc_reshape = tf.reshape(x_pool2,[-1,7 * 7 * 64])

         h_fc1 = tf.nn.relu(tf.matmul(x_fc_reshape, w_fc) + b_fc)

         # 在输出之前加入dropout以减少过拟合

         keep_prob = tf.placeholder("float")

         h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

         w_fc1 = weight_variavles([1024, 10])

         b_fc1 = bias_variavles([10])

         #进行矩阵运算得出每个样本的10个结果[NONE, 10]，输出

         y_predict = tf.nn.softmax(tf.matmul(h_fc1_drop, w_fc1) + b_fc1)

     return x, y_true, y_predict,keep_prob

 def conv_fc():

     #获取数据，MNIST_data是楼主用来存放官方的数据集，如果你要这样表示的话，那MNIST_data这个文件夹应该和这个python文件在同一目录

     mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

     #定义模型，得出输出

     x,y_true,y_predict,keep_prob = model()

     #进行交叉熵损失计算

     #3.计算交叉熵损失

     with tf.variable_scope("soft_cross"):

         #求平均交叉熵损失,tf.reduce_mean对列表求平均值

         loss = -tf.reduce_sum(y_true*tf.log(y_predict))

     #4.梯度下降求出最小损失,注意在深度学习中，或者网络层次比较复杂的情况下，学习率通常不能太高

     with tf.variable_scope("optimizer"):

         train_op = tf.train.AdamOptimizer(1e-4).minimize(loss)

     #5.计算准确率

     with tf.variable_scope("acc"):

         equal_list = tf.equal(tf.argmax(y_true, 1), tf.argmax(y_predict, 1))

         #equal_list None个样本 类型为列表1为预测正确，0为预测错误[1, 0, 1, 0......]

         accuray = tf.reduce_mean(tf.cast(equal_list, tf.float32))

     init_op = tf.global_variables_initializer()

     saver = tf.train.Saver()

     #开启会话运行

     with tf.Session() as sess:

         sess.run(init_op)

         for i in range(3000):

             mnist_x, mnist_y = mnist.train.next_batch(50)

             if i%100 == 0:

                 # 评估模型准确度，此阶段不使用Dropout

                 train_accuracy = accuray.eval(feed_dict={x:mnist_x, y_true: mnist_y, keep_prob: 1.0})

                 print("step %d, training accuracy %g"%(i, train_accuracy))

             # 训练模型，此阶段使用50%的Dropout

             train_op.run(feed_dict={x:mnist_x, y_true: mnist_y, keep_prob: 0.5})

         # 将模型保存在你自己想保存的位置

         saver.save(sess, "D:/Dict/model/fc_model.ckpt")

     return None

 if __name__ == "__main__":

     conv_fc()

然后在你保存模型的目录下会产生4个文件

.data文件是用来记录权重，偏置等参数信息；.meta是用来记录tensorflow图的结构。以下是我的电脑的结果图：

我只运行了6000次，按照tensorflow官方文档，运行9000次左右可以达到0.992左右的正确率

（3）自己手写数字，并用matlab进行预处理

首先让我们看一下预处理的结果：->->

具体过程也就是分为3个步骤：缩小它的大小为28*28像素，并转变为灰度图，最后进行二值化处理。具体matlab的代码如下：

clear all; close all; clc;

% 改图片像素为28*28

I=imread('5.jpg'); %你自己手写的数字的图片

J=imresize(I,[28,28]);

imshow(I);

figure;

imshow(J);

imwrite(J,'new5.jpp');%生成28*28手写数字图片

接下来进行灰度与二值化处理

clear all;close all;clc;

% Read an input image

A = imread('new5.jpg');

% Convert the image to single-channel grayscale image

A_gray = rgb2gray(A);

figure,imhist(A_gray),title('hist of A_grey');

% Convert image to double i.e., [0,1]

A_gray = im2double(A_gray);

% Generate threhold value using Otsu's algorithm

otsu_level = graythresh(A_gray);

% Threshold image using Otsu's threshold and manually defined

% threshold values

B_otsu_thresh = im2bw(A_gray, otsu_level);

B_thresh_50 = im2bw(A_gray, 50/255);

B_thresh_100 = im2bw(A_gray, 100/255);

B_thresh_150 = im2bw(A_gray, 150/255);

B_thresh_200 = im2bw(A_gray, 200/255);

% Display original and thresholded binary images side-by-side

figure, subplot(2, 3, 1), imshow(A_gray), title('Original image');

subplot(2, 3, 2), imshow(B_otsu_thresh), title('Binary image using Otsu threshold value');

subplot(2, 3, 3), imshow(B_thresh_50), title('Binary image using threshold value = 50');

subplot(2, 3, 4), imshow(B_thresh_100), title('Binary image using threshold value = 100');

subplot(2, 3, 5), imshow(B_thresh_150), title('Binary image using threshold value = 150');

subplot(2, 3, 6), imshow(B_thresh_200), title('Binary image using threshold value = 200');

imwrite(B_otsu_thresh,'newnew5.jpg');%填写你希望最终生成的数据集的名字和路径

到此就完成了对自己手写图片的预处理过程！

预处理的方法有很多，在这我在介绍一种利用OPENCV进行预处理：

import cv2

global img

global point1, point2

def on_mouse(event, x, y, flags, param):

    global img, point1, point2

    img2 = img.copy()

    if event == cv2.EVENT_LBUTTONDOWN:         #左键点击

        point1 = (x,y)

        cv2.circle(img2, point1, 10, (0,255,0), 5)

        cv2.imshow('image', img2)

    elif event == cv2.EVENT_MOUSEMOVE and (flags & cv2.EVENT_FLAG_LBUTTON):   #按住左键拖曳

        cv2.rectangle(img2, point1, (x,y), (255,0,0), 5) # 图像，矩形顶点，相对顶点，颜色，粗细

        cv2.imshow('image', img2)

    elif event == cv2.EVENT_LBUTTONUP:         #左键释放

        point2 = (x,y)

        cv2.rectangle(img2, point1, point2, (0,0,255), 5)

        cv2.imshow('image', img2)

        min_x = min(point1[0], point2[0])

        min_y = min(point1[1], point2[1])

        width = abs(point1[0] - point2[0])

        height = abs(point1[1] -point2[1])

        cut_img = img[min_y:min_y+height, min_x:min_x+width]

        resize_img = cv2.resize(cut_img, (28,28)) # 调整图像尺寸为28*28

        ret, thresh_img = cv2.threshold(resize_img,127,255,cv2.THRESH_BINARY) # 二值化

        cv2.imshow('result', thresh_img)

        cv2.imwrite('new5.jpg', thresh_img)  # 预处理后图像保存位置

def main():

    global img

    img = cv2.imread('5.jpg')  # 手写数字图像所在位置

    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # 转换图像为单通道(灰度图)

    cv2.namedWindow('image')

    cv2.setMouseCallback('image', on_mouse) # 调用回调函数

    cv2.imshow('image', img)

    cv2.waitKey(0)

if __name__ == '__main__':

    main()

以上两种方法都可以，甚至还有大神利用PS自己生成数据集，感兴趣的同学可以自己去搜索一下~

（4）将图片输入网络进行识别

完成图像预处理后，即可将图片输入到网络中进行识别

 from PIL import Image, ImageFilter

 import tensorflow as tf

 import matplotlib.pyplot as plt

 def imageprepare():

     im = Image.open('C:/Users/tgp/Desktop/newnew5.jpg')

     plt.imshow(im)

     data = list(im.getdata())

     result = [(255-x)*1.0/255.0 for x in data]

     return result

 #定义初始化权重的函数

 def weight_variavles(shape):

     w = tf.Variable(tf.truncated_normal(shape, stddev=0.1))

     return w

 #定义一个初始化偏置的函数

 def bias_variavles(shape):

     b = tf.Variable(tf.constant(0.0, shape=shape))

     return b 

 def model():

     tf.reset_default_graph()

     #1.建立数据的占位符 x [None, 784]  y_true [None, 10]

     with tf.variable_scope("date"):

         x = tf.placeholder(tf.float32, [None, 784])

         #y_true = tf.placeholder(tf.float32, [None, 10])

     #2.卷积层1  卷积:5*5*1,32个filter,strides= 1-激活-池化

     with tf.variable_scope("conv1"):

         #随机初始化权重

         w_conv1 = weight_variavles([5, 5, 1, 32])

         b_conv1 = bias_variavles([32])

         #对x进行形状的改变[None, 784] ----- [None,28,28,1]

         x_reshape = tf.reshape(x,[-1, 28, 28, 1])  #不能填None,不知道就填-1

         # [None,28, 28, 1] -------- [None, 28, 28, 32]

         x_relu1 = tf.nn.relu(tf.nn.conv2d(x_reshape, w_conv1, strides=[1, 1, 1, 1], padding = "SAME") + b_conv1)

         #池化 2*2，步长为2，【None, 28,28, 32]--------[None,14, 14, 32]

         x_pool1 = tf.nn.max_pool(x_relu1, ksize=[1, 2, 2, 1],strides = [1,2,2,1],padding = "SAME")

     #3.卷积层2  卷积:5*5*32,64个filter,strides= 1-激活-池化

     with tf.variable_scope("conv2"):

         #随机初始化权重和偏置

         w_conv2 = weight_variavles([5, 5, 32, 64])

         b_conv2 = bias_variavles([64])

         #卷积、激活、池化

         #[None,14, 14, 32]----------【NOne, 14, 14, 64]

         x_relu2 = tf.nn.relu(tf.nn.conv2d(x_pool1, w_conv2,strides=[1, 1, 1, 1], padding = "SAME") + b_conv2)

         #池化 2*2，步长为2 【None, 14,14，64]--------[None,7, 7, 64]

         x_pool2 = tf.nn.max_pool(x_relu2, ksize=[1, 2, 2, 1],strides = [1,2,2,1],padding = "SAME")

     #4.全连接层 [None,7, 7, 64] --------- [None, 7*7*64] * [7*7*64, 10]+[10] = [none, 10]

     with tf.variable_scope("fc"):

         #随机初始化权重和偏置:

         w_fc = weight_variavles([7 * 7 * 64, 1024])

         b_fc = bias_variavles([1024])

         #修改形状 [none, 7, 7, 64] ----------[None, 7*7*64]

         x_fc_reshape = tf.reshape(x_pool2,[-1,7 * 7 * 64])

         h_fc1 = tf.nn.relu(tf.matmul(x_fc_reshape, w_fc) + b_fc)

         keep_prob = tf.placeholder("float")

         h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

         w_fc1 = weight_variavles([1024, 10])

         b_fc1 = bias_variavles([10])

         #进行矩阵运算得出每个样本的10个结果[NONE, 10]

         #y_predict = tf.matmul(h_fc1_drop, w_fc1) + b_fc1

         y_predict = tf.nn.softmax(tf.matmul(h_fc1_drop, w_fc1) + b_fc1)

     return x, y_predict,keep_prob

 def conv_fc():

     #获取数据

     result = imageprepare()

     #定义模型，得出输出

     x,y_predict,keep_prob = model()

     init_op = tf.global_variables_initializer()

     saver = tf.train.Saver()

     #开启会话运行

     #tf.reset_default_graph()

     with tf.Session() as sess:

         sess.run(init_op)

         print(result)

         saver.restore(sess, "D:/Dict/model/fc_model.ckpt")

         prediction = tf.argmax(y_predict,1)

         predint = prediction.eval(feed_dict={x: [result],keep_prob: 1.0}, session=sess)

         print(predint)

         print("recognize result: %d" %predint[0])

     return None

 if __name__ == "__main__":

     conv_fc()

运行结果如下：

（5）实践过程遇到的坑与总结

刚开始写训练模型的代码的时候，我认为不需要防止过拟合这个处理过程，所以在我的模型里面没有防止过拟合这一操作，直接导致的结果是：在训练模型的时候效果非常不错，但是当真正拿自己手写数字去识别的时候，经常把‘4’和‘9’搞错。随便我在输出层和全连接层中间添加了一些代码用于防止过拟合，这样训练出的模型表现结果尚佳！由此可见，在训练模型的时候防止过拟合的操作还是非常有必要的。
有关随机初始化权重和偏置的函数的选择：利用tf.truncated_normal()这个函数随机初始化权重训练出的模型的表现效果比利用tf.random_nomal()这个函数训练出的模型表现的更好，上网查询了一下，发现这两个函数有一下的区别：tf.truncated_normal的输出如字面意思是截断的，而截断的标准是2倍的stddev。使用tf.truncated_normal的输出是不可能出现[-2,2]以外的点的，而如果shape够大的话，tf.random_normal却会产生2.2或者2.4之类的输出。也就是说使用tf.random_normal产生的初始权重的值比tf.truncated_normal产生的大，这对于神经网络而言是致命的，因为这样非常容易产生梯度消失的问题。
在随机初始化权重和偏置的时候，方差不能设置的过大，若方差过大，则在训练的时候准确率一直维持在很低的位置，容易产生梯度消失的问题。
保存模型尽量以.ckpt结果，反正楼主一开始没有以.ckpt结尾，带来了很多麻烦，然后加上这个后缀，啥问题都消失了~（可能是玄学，不加可能也行的通，但是加了一定不会错~~）

以上就是本次实践的全部过程，欢迎大家交流讨论。

参考：https://www.cnblogs.com/lizheng114/p/7498328.html