卷积神经网络(Convolutional Neural Network, CNN)是一种前馈神经网络,它的人工神经元可以响应一部分覆盖范围内的周围单元,对于大型图像处理有出色表现。与普通神经网络非常相似,它们都由具有可学习的权重和偏置常量(biases)的神经元组成。每个神经元都接收一些输入,并做一些点积计算,输出是每个分类的分数,普通神经网络里的一些计算技巧到这里依旧适用。


  • 卷积层(Convolutional layer),卷积神经网路中每层卷积层由若干卷积单元组成,每个卷积单元的参数都是通过反向传播算法优化得到的。卷积运算的目的是提取输入的不同特征,第一层卷积层可能只能提取一些低级的特征如边缘、线条和角等层级,更多层的网络能从低级特征中迭代提取更复杂的特征。
  • 线性整流层(Rectified Linear Units layer, ReLU layer),这一层神经的活性化函数(Activation function)使用线性整流(Rectified Linear Units, ReLU)f(x)=max(0,x)。
  • 池化层(Pooling layer),通常在卷积层之后会得到维度很大的特征,将特征切成几个区域,取其最大值或平均值,得到新的、维度较小的特征。
  • Drop out, 通常我们在训练Covnets时,会随机的丢弃一部分训练获得的参数,这样可以在一定程度上来防止过度拟合
  • 全连接层( Fully-Connected layer), 把所有局部特征结合变成全局特征,用来计算最后每一类的得分。

下面是代码部分,今天我将使用Covnets去完成一件非常非常简单的图像分类任务。这里我们将对 CIFAR-10 数据集 中的图片进行分类。该数据集包含飞机、猫狗和其他物体。

首先,我们先获得数据集 (或者直接从 https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz )这里直接下载

 from urllib.request import urlretrieve
from os.path import isfile, isdir
from tqdm import tqdm
import tarfile cifar10_dataset_folder_path = 'cifar-10-batches-py' class DLProgress(tqdm):
last_block = 0 def hook(self, block_num=1, block_size=1, total_size=None):
self.total = total_size
self.update((block_num - self.last_block) * block_size)
self.last_block = block_num if not isfile(tar_gz_path):
with DLProgress(unit='B', unit_scale=True, miniters=1, desc='CIFAR-10 Dataset') as pbar:
pbar.hook) if not isdir(cifar10_dataset_folder_path):
with tarfile.open(tar_gz_path) as tar:


 def normalize(x):
Normalize a list of sample image data in the range of 0 to 1
: x: List of image data. The image shape is (32, 32, 3)
: return: Numpy array of normalize data
a = 0
b = 1
grayscale_min = 0
grayscale_max = 255
return a + (((x - grayscale_min) * (b - a))/(grayscale_max - grayscale_min))


 def one_hot_encode(x):
One hot encode a list of sample labels. Return a one-hot encoded vector for each label.
: x: List of sample Labels
: return: Numpy array of one-hot encoded labels
d = {0:[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
1:[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
2:[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
3:[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
4:[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
5:[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
6:[0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
7:[0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
8:[0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
9:[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]} map_list = []
for item in x:
target = np.array(map_list) return target


 import tensorflow as tf

 def neural_net_image_input(image_shape):
Return a Tensor for a batch of image input
: image_shape: Shape of the images
: return: Tensor for image input.
x = tf.placeholder(tf.float32,[None, image_shape[0], image_shape[1],image_shape[2]],'x')
return x def neural_net_label_input(n_classes):
Return a Tensor for a batch of label input
: n_classes: Number of classes
: return: Tensor for label input.
y = tf.placeholder(tf.float32,[None, n_classes],'y')
return y def neural_net_keep_prob_input():
Return a Tensor for keep probability
: return: Tensor for keep probability.
keep_prob = tf.placeholder(tf.float32,None,'keep_prob')
return keep_prob

接着 我们来构建Covnets中最核心的 卷积层+最大池化层(这里我们用最大池化)

 def conv2d_maxpool(x_tensor, conv_num_outputs, conv_ksize, conv_strides, pool_ksize, pool_strides):
Apply convolution then max pooling to x_tensor
:param x_tensor: TensorFlow Tensor
:param conv_num_outputs: Number of outputs for the convolutional layer
:param conv_ksize: kernal size 2-D Tuple for the convolutional layer
:param conv_strides: Stride 2-D Tuple for convolution
:param pool_ksize: kernal size 2-D Tuple for pool
:param pool_strides: Stride 2-D Tuple for pool
: return: A tensor that represents convolution and max pooling of x_tensor
## Weights and Bias
weight = tf.Variable(tf.truncated_normal([conv_ksize[0],conv_ksize[1],
bias = tf.Variable(tf.zeros(conv_num_outputs))
## Apply Convolution
conv_layer = tf.nn.conv2d(x_tensor,weight,strides = [1,conv_strides[0],conv_strides[1],1], padding='SAME')
## Add Bias
conv_layer = tf.nn.bias_add(conv_layer,bias)
## Apply Relu
conv_layer = tf.nn.relu(conv_layer) return tf.nn.max_pool(conv_layer,

实现 flatten 层,将 x_tensor 的维度从四维张量(4-D tensor)变成二维张量。输出应该是形状(部分大小(Batch Size),扁平化图片大小(Flattened Image Size))

 def flatten(x_tensor):
Flatten x_tensor to (Batch Size, Flattened Image Size)
: x_tensor: A tensor of size (Batch Size, ...), where ... are the image dimensions.
: return: A tensor of size (Batch Size, Flattened Image Size).
# Get the shape of tensor
shape = x_tensor.get_shape().as_list()
# Compute the dim for image
dim = np.prod(shape[1:])
# reshape the tensor return tf.reshape(x_tensor, [-1,dim])

在网络的最后一步,我们需要做一个全连接层 + 输出层,然后输出一个1*10的结果(10种结果的概率)

 def fully_conn(x_tensor, num_outputs):
Apply a fully connected layer to x_tensor using weight and bias
: x_tensor: A 2-D tensor where the first dimension is batch size.
: num_outputs: The number of output that the new tensor should be.
: return: A 2-D tensor where the second dimension is num_outputs.
weight = tf.Variable(tf.truncated_normal([x_tensor.get_shape().as_list()[-1], num_outputs],stddev=0.1))
bias = tf.Variable(tf.zeros([num_outputs])) fc = tf.reshape(x_tensor,[-1, weight.get_shape().as_list()[0]])
fc = tf.add(tf.matmul(fc,weight), bias)
fc = tf.nn.relu(fc) return fc def output(x_tensor, num_outputs):
Apply a output layer to x_tensor using weight and bias
: x_tensor: A 2-D tensor where the first dimension is batch size.
: num_outputs: The number of output that the new tensor should be.
: return: A 2-D tensor where the second dimension is num_outputs.
""" weight_out = tf.Variable(tf.truncated_normal([x_tensor.get_shape().as_list()[-1],num_outputs],stddev=0.1))
bias_out = tf.Variable(tf.zeros([num_outputs])) out = tf.reshape(x_tensor, [-1, weight_out.get_shape().as_list()[0]])
out = tf.add(tf.matmul(out,weight_out),bias_out) return out


 def conv_net(x, keep_prob):
Create a convolutional neural network model
: x: Placeholder tensor that holds image data.
: keep_prob: Placeholder tensor that hold dropout keep probability.
: return: Tensor that represents logits
""" conv1 = conv2d_maxpool(x, 32,(5,5),(2,2),(4,4),(2,2)) conv2 = conv2d_maxpool(conv1, 128, (5,5),(2,2),(2,2),(2,2)) conv3 = conv2d_maxpool(conv2, 256, (5,5),(2,2),(2,2),(2,2)) # flatten(x_tensor) flatten_layer = flatten(conv3) # fully_conn(x_tensor, num_outputs) fc = fully_conn(flatten_layer, 1024) # Set this to the number of classes
# Function Definition from Above:
# output(x_tensor, num_outputs) output_layer = output(fc, 10) return output_layer ##############################
## Build the Neural Network ##
############################## # Remove previous weights, bias, inputs, etc..
tf.reset_default_graph() # Inputs
x = neural_net_image_input((32, 32, 3))
y = neural_net_label_input(10)
keep_prob = neural_net_keep_prob_input() # Model
logits = conv_net(x, keep_prob) # Name logits Tensor, so that is can be loaded from disk after training
logits = tf.identity(logits, name='logits') # Loss and Optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
optimizer = tf.train.AdamOptimizer().minimize(cost) # Accuracy
correct_pred = tf.equal(tf.argmax(logits, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32), name='accuracy')



 epochs = 30
batch_size = 256
keep_probability = 0.5


 def print_stats(session, feature_batch, label_batch, cost, accuracy):
Print information about loss and validation accuracy
: session: Current TensorFlow session
: feature_batch: Batch of Numpy image data
: label_batch: Batch of Numpy label data
: cost: TensorFlow cost function
: accuracy: TensorFlow accuracy function
loss = sess.run(cost, feed_dict = {
}) valid_acc = sess.run(accuracy,feed_dict = {
}) print('Loss: {:>10.4f} Validation Accuracy: {:.6f}'.format(


 save_model_path = './image_classification'

with tf.Session() as sess:
# Initializing the variables
sess.run(tf.global_variables_initializer()) # Training cycle
for epoch in range(epochs):
# Loop over all batches
n_batches = 5
for batch_i in range(1, n_batches + 1):
for batch_features, batch_labels in helper.load_preprocess_training_batch(batch_i, batch_size):
train_neural_network(sess, optimizer, keep_probability, batch_features, batch_labels)
print('Epoch {:>2}, CIFAR-10 Batch {}: '.format(epoch + 1, batch_i), end='')
print_stats(sess, batch_features, batch_labels, cost, accuracy) # Save Model
saver = tf.train.Saver()
save_path = saver.save(sess, save_model_path)


Epoch 29, CIFAR-10 Batch 4:  Loss:     0.0139 Validation Accuracy: 0.625600
Epoch 29, CIFAR-10 Batch 5: Loss: 0.0090 Validation Accuracy: 0.631000
Epoch 30, CIFAR-10 Batch 1: Loss: 0.0138 Validation Accuracy: 0.638800
Epoch 30, CIFAR-10 Batch 2: Loss: 0.0192 Validation Accuracy: 0.627400
Epoch 30, CIFAR-10 Batch 3: Loss: 0.0055 Validation Accuracy: 0.633400
Epoch 30, CIFAR-10 Batch 4: Loss: 0.0114 Validation Accuracy: 0.641800
Epoch 30, CIFAR-10 Batch 5: Loss: 0.0050 Validation Accuracy: 0.647400

还不错,50%以上了,如果瞎猜 只有10%的





