这几天在用TensorFlow搭建一个神经网络来做一个binary classifier,搭建一个典型的神经网络的基本思路是:
- 定义神经网络的layers(层)以及初始化每一层的参数
- 然后迭代:
- 前向传播(Forward propagation)
- 计算cost(Compute cost)
- 反向传播(Backward propagation)
- 更新参数(Update parameters)
- 使用训练好的参数去做预测
来计算cost。 我们知道softmax一般是用来做multiclass classifier的,也就是输出的类别要大于两个。对于一个binary classifier而言,很明显我们要用sigmoid函数也就是tf.nn.sigmoid_cross_entropy_with_logits
那么为什么在binary classifier中使用了softmax之后cost就一直是0呢?我们先来看一下softmax的公式:
- binary classifier的output是一维的(one-dimension 0/1),那么如果只有一个元素,那么s(z)就永远等于1,不管z的值是多少。
- 恒定输出1之后,我们结合交叉熵的计算公式可知:
- 如果true label是0,那么
-0*log(1) = 0
- 如果true label是1,那么
-1*log(1) = 0
- 如果true label是0,那么
Tensorflow函数:tf.nn.softmax_cross_entropy_with_logits 讲解
tf.nn.softmax_cross_entropy_with_logits(_sentinel=None, labels=None, logits=None, dim=-1, name=None)
Computes softmax cross entropy between logits
and labels
Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.
NOTE: While the classes are mutually exclusive, their probabilities need not be. All that is required is that each row oflabels
is a valid probability distribution. If they are not, the computation of the gradient will be incorrect.
If using exclusive labels
(wherein one and only one class is true at a time), seesparse_softmax_cross_entropy_with_logits
WARNING: This op expects unscaled logits, since it performs a softmax
on logits
internally for efficiency. Do not call this op with the output of softmax
, as it will produce incorrect results.
and labels
must have the same shape [batch_size, num_classes]
and the same dtype (either float16
, or float64
Note that to avoid confusion, it is required to pass only named arguments to this function.
: Used to prevent positional parameters. Internal, do not use.labels
: Each rowlabels[i]
must be a valid probability distribution.logits
: Unscaled log probabilities.dim
: The class dimension. Defaulted to -1 which is the last dimension.name
: A name for the operation (optional).
这个函数至少需要两个参数:labels, logits.
- #coding=utf-8
- import tensorflow as tf
- from tensorflow.examples.tutorials.mnist import input_data
- mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
- def compute_accuracy(v_xs,v_ys):
- global prediction
- y_pre=sess.run(prediction,feed_dict={xs:v_xs,keep_prob:1}) #这里的keep_prob是保留概率,即我们要保留的RELU的结果所占比例
- correct_prediction=tf.equal(tf.argmax(y_pre,1),tf.argmax(v_ys,1))
- accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
- result=sess.run(accuracy,feed_dict={xs:v_xs,ys:v_ys,keep_prob:1})
- return result
- def weight_variable(shape):
- inital=tf.truncated_normal(shape,stddev=0.1) #stddev爲標準差
- return tf.Variable(inital)
- def bias_variable(shape):
- inital=tf.constant(0.1,shape=shape)
- return tf.Variable(inital)
- def conv2d(x,W): #x爲像素值,W爲權值
- #strides[1,x_movement,y_movement,1]
- #must have strides[0]=strides[3]=1
- #padding=????
- return tf.nn.conv2d(x,W,strides=[1,1,1,1],padding='SAME')#
- def max_pool_2x2(x):
- # strides[1,x_movement,y_movement,1]
- return tf.nn.max_pool(x,ksize=[1,2,2,1],strides=[1,2,2,1],padding='SAME')#ksize二三维为池化窗口
- #define placeholder for inputs to network
- xs=tf.placeholder(tf.float32,[None,784])/255
- ys=tf.placeholder(tf.float32,[None,10])
- keep_prob=tf.placeholder(tf.float32)
- x_image=tf.reshape(xs, [-1,28,28,1]) #-1为这个维度不确定,变成一个4维的矩阵,最后为最里面的维数
- #print x_image.shape #最后这个1理解为输入的channel,因为为黑白色所以为1
- ##conv1 layer##
- W_conv1=weight_variable([5,5,1,32]) #patch 5x5,in size 1 是image的厚度,outsize 32 是提取的特征的维数
- b_conv1=bias_variable([32])
- h_conv1=tf.nn.relu(conv2d(x_image,W_conv1)+b_conv1)# output size 28x28x32 因为padding='SAME'
- h_pool1=max_pool_2x2(h_conv1) #output size 14x14x32
- ##conv2 layer##
- W_conv2=weight_variable([5,5,32,64]) #patch 5x5,in size 32 是conv1的厚度,outsize 64 是提取的特征的维数
- b_conv2=bias_variable([64])
- h_conv2=tf.nn.relu(conv2d(h_pool1,W_conv2)+b_conv2)# output size 14x14x64 因为padding='SAME'
- h_pool2=max_pool_2x2(h_conv2) #output size 7x7x64
- ##func1 layer##
- W_fc1=weight_variable([7*7*64,1024])
- b_fc1=bias_variable([1024])
- #[n_samples,7,7,64]->>[n_samples,7*7*64]
- h_pool2_flat=tf.reshape(h_pool2,[-1,7*7*64])
- h_fc1=tf.nn.relu(tf.matmul(h_pool2_flat,W_fc1)+b_fc1)
- h_fc1_drop=tf.nn.dropout(h_fc1,keep_prob) #防止过拟合
- ##func2 layer##
- W_fc2=weight_variable([1024,10])
- b_fc2=bias_variable([10])
- #prediction=tf.nn.softmax(tf.matmul(h_fc1_drop,W_fc2)+b_fc2)
- prediction=tf.matmul(h_fc1_drop,W_fc2)+b_fc2
- #h_fc1_drop=tf.nn.dropout(h_fc1,keep_prob) #防止过拟合
- #the errro between prediction and real data
- #cross_entropy = tf.reduce_mean(-tf.reduce_sum(ys*tf.log(prediction),reduction_indices=[1]))
- cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=ys, logits=prediction)
- train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
- sess=tf.Session()
- sess.run(tf.global_variables_initializer())
- for i in range(1000):
- batch_xs,batch_ys=mnist.train.next_batch(100)
- sess.run(train_step,feed_dict={xs:batch_xs,ys:batch_ys,keep_prob:0.5})
- if i%50 ==0:
- accuracy = 0
- for j in range(10):
- test_batch = mnist.test.next_batch(1000)
- acc_forone=compute_accuracy(test_batch[0], test_batch[1])
- #print 'once=%f' %(acc_forone)
- accuracy=acc_forone+accuracy
- print '测试结果:batch:%g,准确率:%f' %(i,accuracy/10)
- 测试结果:batch:0,准确率:0.090000
- 测试结果:batch:50,准确率:0.788600
- 测试结果:batch:100,准确率:0.880200
- 测试结果:batch:150,准确率:0.904600
- 测试结果:batch:200,准确率:0.927500
- 测试结果:batch:250,准确率:0.929800
- 测试结果:batch:300,准确率:0.939600
- 测试结果:batch:350,准确率:0.942100
- 测试结果:batch:400,准确率:0.950600
- 测试结果:batch:450,准确率:0.950700
- 测试结果:batch:500,准确率:0.956700
- 测试结果:batch:550,准确率:0.956000
- 测试结果:batch:600,准确率:0.957100
- 测试结果:batch:650,准确率:0.958400
- 测试结果:batch:700,准确率:0.961500
- 测试结果:batch:750,准确率:0.963800
- 测试结果:batch:800,准确率:0.965000
- 测试结果:batch:850,准确率:0.966300
- 测试结果:batch:900,准确率:0.967800
- 测试结果:batch:950,准确率:0.967700
