当训练数据量较小时，采用直接读取文件的方式，当训练数据量非常大时，直接读取文件的方式太耗内存，这时应采用高效的读取方法，读取tfrecords文件，这其实是一种二进制文件。tensorflow为其内置了各种存储和读取的函数，方便调用。

　　不知道为啥，从tfrecords中读取数据用于训练时，收敛得更快，更平稳。上面两个图是使用tfrecords的准确率和loss值变化，下面是直接读取文件的准确率和loss值变化。

1 生成记录样本的记录文件

 root_dir = os.getcwd()

 def getTrianList():

     with open("train.txt","w") as f:

         for file in os.listdir(root_dir+'\\dataSet'):

             for picFile in os.listdir(root_dir+"\\dataSet\\"+file):

                 f.write("dataSet/"+file+"/"+picFile+" "+file+"\n")

                 print(picFile)

 if __name__=="__main__":

     getTrianList()

　　将样本文件路径和标签统一记录到一个txt中，后面生成tfrecords文件就是通过读取这些信息。

　　注意文件路径和标签之间采用空格，不要使用制表符。

2 读取txt存于数组中

 def load_file(example_list_file):

     lines = np.genfromtxt(example_list_file,delimiter=" ",dtype=[('col1', 'S120'), ('col2', 'i8')])

     examples = []

     labels = []

     for example,label in lines:

         examples.append(example)

         labels.append(label)

     #convert to numpy array

     return np.asarray(examples),np.asarray(labels),len(lines)

　　这段代码主要用来读取第1步生成的txt，将文件路径和标签存于数组中

3 读取图片

 def extract_image(filename,height,width):

     print(filename)

     image = cv2.imread(filename)

     image = cv2.resize(image,(height,width))

     b,g,r = cv2.split(image)

     rgb_image = cv2.merge([r,g,b])

     return rgb_image

　　使用cv2读取图片文件

4 转化为tfrecords文件

 def trans2tfRecord(trainFile,name,output_dir,height,width):

     if not os.path.exists(output_dir) or os.path.isfile(output_dir):

         os.makedirs(output_dir)

     _examples,_labels,examples_num = load_file(train_file)

     filename = name + '.tfrecords'

     writer = tf.python_io.TFRecordWriter(filename)

     for i,[example,label] in enumerate(zip(_examples,_labels)):

         print("NO{}".format(i))

         #need to convert the example(bytes) to utf-8

         example = example.decode("UTF-8")

         image = extract_image(example,height,width)

         image_raw = image.tostring()

         example = tf.train.Example(features=tf.train.Features(feature={

                 'image_raw':_bytes_feature(image_raw),

                 'height':_int64_feature(image.shape[0]),

                  'width': _int64_feature(32),

                 'depth': _int64_feature(32),

                  'label': _int64_feature(label)

                 }))

         writer.write(example.SerializeToString())

     writer.close()

 def _int64_feature(value):

     return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))  

 def _bytes_feature(value):

     return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

5 从tfrecords中读取训练数据

 def read_tfRecord(file_tfRecord):

     queue = tf.train.string_input_producer([file_tfRecord])

     reader = tf.TFRecordReader()

     _,serialized_example = reader.read(queue)

     features = tf.parse_single_example(

             serialized_example,

             features={

           'image_raw': tf.FixedLenFeature([], tf.string),

           'height': tf.FixedLenFeature([], tf.int64),

           'width':tf.FixedLenFeature([], tf.int64),

           'depth': tf.FixedLenFeature([], tf.int64),

           'label': tf.FixedLenFeature([], tf.int64)

                     }

             )

     image = tf.decode_raw(features['image_raw'],tf.uint8)

     #height = tf.cast(features['height'], tf.int64)

     #width = tf.cast(features['width'], tf.int64)

     image = tf.reshape(image,[32,32,3])

     image = tf.cast(image, tf.float32)

     image = tf.image.per_image_standardization(image)

     label = tf.cast(features['label'], tf.int64)

     print(image,label)

     return image,label

　　从tfrecords文件中读取image和label，训练的时候，直接使用tf.train.batch函数生成用于训练的batch即可。

 image_batches,label_batches = tf.train.batch([image, label], batch_size=16, capacity=20)

　　其余的部分跟之前的训练步骤一样。

tensorflowxun训练自己的数据集之从tfrecords读取数据的更多相关文章

TensorFlow学习笔记——LeNet-5（训练自己的数据集）
在之前的TensorFlow学习笔记——图像识别与卷积神经网络(链接:请点击我)中了解了一下经典的卷积神经网络模型LeNet模型.那其实之前学习了别人的代码实现了LeNet网络对MNIST数据集的训练 ...
Fast RCNN 训练自己的数据集（3训练和检测）
转载请注明出处,楼燚(yì)航的blog,http://www.cnblogs.com/louyihang-loves-baiyan/ https://github.com/YihangLou/fas ...
【faster-rcnn】训练自己的数据集时的坑
既然faster-rcnn原版发表时候是matlab版代码,那就用matlab版代码吧!不过遇到的坑挺多的,不知道python版会不会好一点. ======= update ========= 总体上 ...
【Tensorflow系列】使用Inception_resnet_v2训练自己的数据集并用Tensorboard监控
[写在前面] 用Tensorflow(TF)已实现好的卷积神经网络(CNN)模型来训练自己的数据集,验证目前较成熟模型在不同数据集上的准确度,如Inception_V3, VGG16,Inceptio ...
目标检测算法SSD之训练自己的数据集
目标检测算法SSD之训练自己的数据集 prerequesties 预备知识/前提条件下载和配置了最新SSD代码 git clone https://github.com/weiliu89/caffe ...
可变卷积Deforable ConvNet 迁移训练自己的数据集 MXNet框架 GPU版
[引言] 最近在用可变卷积的rfcn 模型迁移训练自己的数据集, MSRA官方使用的MXNet框架环境搭建及配置:http://www.cnblogs.com/andre-ma/p/8867031. ...
caffe训练自己的数据集
默认caffe已经编译好了,并且编译好了pycaffe 1 数据准备首先准备训练和测试数据集,这里准备两类数据,分别放在文件夹0和文件夹1中(之所以使用0和1命名数据类别,是因为方便标注数据类别,直 ...
使用yolo3模型训练自己的数据集
使用yolo3模型训练自己的数据集本项目地址:https://github.com/Cw-zero/Retrain-yolo3 一.运行环境 1. Ubuntu16.04. 2. TensorFlo ...
【tf.keras】在 cifar 上训练 AlexNet，数据集过大导致 OOM
cifar-10 每张图片的大小为 32×32,而 AlexNet 要求图片的输入是 224×224(也有说 227×227 的,这是 224×224 的图片进行大小为 2 的 zero paddin ...

随机推荐

SHGetSpecialFolderPath用法
The SHGetSpecialFolderPath function retrieves the path of a special folder that is identified by its ...
CDialog与CDialogEx的区别联系
CDialogEx是VS2003之后出现的,VC++6.0没有.CDialogEx = CDialog ExtendExtend的意思是扩展,即扩展的CDialog! 这个类是CDialog的扩展类, ...
Android popupwindow 演示样例程序一
经过多番測试实践,实现了popupwindow 弹出在指定控件的下方.代码上有凝视.有须要注意的地方.popupwindow 有自已的布局,里面控件的监听实现都有.接下来看代码实现. 项目资源下载:点 ...
Spring_day04--SSH框架整合过程
SSH框架整合过程第一步导入jar包第二步搭建struts2环境 (1)创建action,创建struts.xml配置文件,配置action (2)配置struts2的过滤器第三步搭建hi ...
1-1、superset开发环境搭建
在对superset进行二次开发的过程中,往往需要搭建本地开发环境,修改后立即看到效果,下面我们就讲下开发环境的搭建. 1.打开PyCharm,在菜单栏上执行VCS-->Checkout fro ...
Git-fatal: unable to access 'xxx' : Could not resolve host: xxx
解决办法:(在知乎上找到确实好用) 1.查询代理 git config --global http.proxy 2.取消代理设置 git config --global --unset http.p ...
二 Android Studio 打包EgretApp (开机画面、横竖屏、调试、和原生交互)
测试环境: Windows7 Egret Engine 5.0.14 Egret support 5.0.12 Android Studio 2.3 目录: 一修改开机画面二横竖屏设置三修改 ...
Objective-C实用类和协议
Objective-C实用类和协议目录概述 NSObject 概述 NSObject 协议<NSObject> 类NSObject 详细方法参考文档实用操作是否为某个类或其子类是 ...
CSS 垂直外边距合并：规范、延伸、原理、解决办法
<CSS 权威指南>第七章基本视觉格式化.p192,提到了垂直外边距合并的情况,解释总体算清晰,但是感觉不全且没有归纳成一条一条的,参考 CSS框模型中外边距(margin)折叠图文详 ...
用angular引入复杂的json文件2
昨天我们也说了一下angular引入复杂json文件的方法,今天我们再来学习一种方法,而且更简单,更快捷. 首先我们引入一个angular插件,并且写上引入模块和控制台,在html中书写上模块名和控制 ...

tensorflowxun训练自己的数据集之从tfrecords读取数据