tensorflow制作tfrecord格式数据

tf.Example msg

tensorflow提供了一种统一的格式.tfrecord来存储图像数据.用的是自家的google protobuf.就是把图像数据序列化成自定义格式的二进制数据.

To read data efficiently it can be helpful to serialize your data and store it in a set of files (100-200MB each) that can each be read linearly. This is especially true if the data is being streamed over a network. This can also be useful for caching any data-preprocessing.

The TFRecord format is a simple format for storing a sequence of binary records.

protobuf消息的格式如下:

https://github.com/tensorflow/tensorflow/blob/r2.0/tensorflow/core/example/feature.proto

message BytesList {

  repeated bytes value = 1;

}

message FloatList {

  repeated float value = 1 [packed = true];

}

message Int64List {

  repeated int64 value = 1 [packed = true];

}

// Containers for non-sequential data.

message Feature {

  // Each feature can be exactly one kind.

  oneof kind {

    BytesList bytes_list = 1;

    FloatList float_list = 2;

    Int64List int64_list = 3;

  }

};

message Features {

  map<string, Feature> feature = 1;

};

message FeatureList {

  repeated Feature feature = 1;

};

message FeatureLists {

  map<string, FeatureList> feature_list = 1;

};

tf.Example是一个map. map的格式为{"string": tf.train.Feature}

tf.train.Feature基本的格式有3种:

tf.train.BytesList
- string
- byte
tf.train.FloatList
- float(float32)
- double(float64)
tf.train.Int64List
- bool
- enum
- int32
- unit32
- int64
- uint64

参考tensorflow官方文档

将自己的数据制作为tfrecord格式

完整代码

from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf

import numpy as np

import IPython.display as display

import os

import cv2 as cv

import argparse

def _bytes_feature(value):

  """Returns a bytes_list from a string / byte."""

  if isinstance(value, type(tf.constant(0))):

    value = value.numpy() # BytesList won't unpack a string from an EagerTensor.

  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def _float_feature(value):

  """Returns a float_list from a float / double."""

  return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))

def _int64_feature(value):

  """Returns an int64_list from a bool / enum / int / uint."""

  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

def convert_to_tfexample(img_data,label,height=320,width=320):

    """convert one img matrix into tf.Example"""

    image_format = 'png'

    example = tf.train.Example(features=tf.train.Features(feature={

    'image/encoded': _bytes_feature(img_data),

    'image/format': _bytes_feature(tf.compat.as_bytes(image_format)),

    'image/class/label': _int64_feature(label),

    'image/height': _int64_feature(height),

    'image/width': _int64_feature(width),

    }))

    return example

#path="/home/sc/disk/data/lishui/1"

def read_dataset(path):

    imgs=[]

    labels=[]

    for root, dirs, files in os.walk(path):

        for one_file in files:

            #print(os.path.join(path,one_file))

            one_file = os.path.join(path,one_file)

            if one_file.endswith("png"):

                label_file = one_file.replace('png','txt')

                if not os.path.isfile(label_file):

                    continue

                f = open(label_file)

                class_index = int(f.readline().split(' ')[0])

                labels.append(class_index)

                img =  tf.gfile.GFile(one_file, 'rb').read()

                print(type(img))

                imgs.append(img)

    return imgs,labels

def arg_parse():

    parser = argparse.ArgumentParser()

    #parser.add_argument('--help',help='ex:python create_tfrecord.py -d /home/sc/disk/data/lishui/1 -o train.tfrecord')

    parser.add_argument('-d','--dir',type=str,default='./data',required='True',help='dir store images/label file')

    parser.add_argument('-o','--output',type=str,default='./outdata.tfrecord',required='True',help='output tfrecord file name')

    args = parser.parse_args()

    return args

def main():

    args = arg_parse()

    writer = tf.io.TFRecordWriter(args.output)

    imgs,labels = read_dataset(args.dir)

    examples = map(convert_to_tfexample,imgs,labels)

    for example in examples:

        writer.write(example.SerializeToString())

    writer.close()

    print("write done")

if __name__ == '__main__':

    """

    usage:python create_tfrecord.py [data_path] [outrecordfile_path]

    ex:python create_tfrecord.py -d /home/sc/disk/data/lishui/1 -o train.tfrecord

    """

    main()

首先就是需要有工具函数把byte/string/float/int..等等类型的数据转换为tf.train.Feature

def _bytes_feature(value):

  """Returns a bytes_list from a string / byte."""

  if isinstance(value, type(tf.constant(0))):

    value = value.numpy() # BytesList won't unpack a string from an EagerTensor.

  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def _float_feature(value):

  """Returns a float_list from a float / double."""

  return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))

def _int64_feature(value):

  """Returns an int64_list from a bool / enum / int / uint."""

  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

接下来,对于图片矩阵和标签数据,我们调用上述工具函数,将单幅图片及其标签信息转换为tf.ttrain.Example消息.

def convert_to_tfexample(img,label):

    """convert one img matrix into tf.Example"""

    img_raw = img.tostring()

    example = tf.train.Example(features=tf.train.Features(feature={

    'label': _int64_feature(label),

    'img': _bytes_feature(img_raw)}))

    return example

对于我的数据,图片以及label文件位于同一目录.比如dir下有图片a.png及相应的标签信息a.txt.

def read_dataset(path):

    imgs=[]

    labels=[]

    for root, dirs, files in os.walk(path):

        for one_file in files:

            #print(os.path.join(path,one_file))

            one_file = os.path.join(path,one_file)

            if one_file.endswith("png"):

                label_file = one_file.replace('png','txt')

                if not os.path.isfile(label_file):

                    continue

                f = open(label_file)

                class_index = int(f.readline().split(' ')[0])

                labels.append(class_index)

                img =  tf.gfile.GFile(one_file, 'rb').read()

                print(type(img))

                imgs.append(img)

    return imgs,labels

遍历data目录,完成图片读取,及label读取. 如果你的数据不是这么存放的,就修改这个函数好了,返回值仍然是imgs,labels

最后就是调用 tf.io.TFRecordWriter将每一个tf.train.Example消息写入文件保存.

def main():

    args = arg_parse()

    writer = tf.io.TFRecordWriter(args.output)

    #path="/home/sc/disk/data/lishui/1"

    imgs,labels = read_dataset(args.dir)

    examples = map(convert_to_tfexample,imgs,labels)

    for example in examples:

        writer.write(example.SerializeToString())

    writer.close()

    print("write done")

tensorflow制作tfrecord格式数据的更多相关文章

Tensorflow 处理libsvm格式数据生成TFRecord (parse libsvm data to TFRecord)
#写libsvm格式数据 write libsvm #!/usr/bin/env python #coding=gbk # ================================= ...
更加清晰的TFRecord格式数据生成及读取
TFRecords 格式数据文件处理流程 TFRecords 文件包含了 tf.train.Example 协议缓冲区(protocol buffer),协议缓冲区包含了特征 Features.Ten ...
[TFRecord格式数据]利用TFRecords存储与读取带标签的图片
利用TFRecords存储与读取带标签的图片原创文章,转载请注明出处~ 觉得有用的话,欢迎一起讨论相互学习~Follow Me TFRecords其实是一种二进制文件,虽然它不如其他格式好理解,但是 ...
"笨方法"学习CNN图像识别（二）—— tfrecord格式高效读取数据
原文地址:https://finthon.com/learn-cnn-two-tfrecord-read-data/-- 全文阅读5分钟 -- 在本文中,你将学习到以下内容: 将图片数据制作成tfre ...
TensorFlow 制作自己的TFRecord数据集
官网的mnist和cifar10数据之后,笔者尝试着制作自己的数据集,并保存,读入,显示. TensorFlow可以支持cifar10的数据格式, 也提供了标准的TFRecord 格式,而关于 ten ...
tensorflow的tfrecord操作代码与数据协议规范
tensorflow的数据集可以说是非常重要的部分,我认为人工智能就是数据加算法,数据没处理好哪来的算法? 对此tensorflow有一个专门管理数据集的方式tfrecord·在训练数据时提取图片与标 ...
3. Tensorflow生成TFRecord
1. Tensorflow高效流水线Pipeline 2. Tensorflow的数据处理中的Dataset和Iterator 3. Tensorflow生成TFRecord 4. Tensorflo ...
day21 TFRecord格式转换MNIST并显示
首先简要介绍了下TFRecord格式以及内部实现protobuf协议,然后基于TFRecord格式,对MNIST数据集转换成TFRecord格式,写入本地磁盘文件,再从磁盘文件读取,通过pyplot模 ...
Tensorflow之TFRecord的原理和使用心得
本文首发于微信公众号「对白的算法屋」大家好,我是对白. 目前,越来越多的互联网公司内部都有自己的一套框架去训练模型,而模型训练时需要的数据则都保存在分布式文件系统(HDFS)上.Hive作为构建在H ...

随机推荐

配置中心-Apollo
配置中心-Apollo 2019/10/01 Chenxin 配置服务主要有携程Apollo.百度Disconf.阿里ACM,目前以Apollo用户量最大.适用场景,多用于微服务,与K8S结合好. ...
Linux 命令个人笔记
[表示命令]man -f [] 显示一个命令的功能whatis [] 显示一个命令的功能ls -lR | grep '^-' | wc -l 统计一个目录下总共有多少个文件head [-n numbe ...
Ubuntu16.04常用C++库安装及环境配置
1. 常用非线性求解库Ceres #================================================================================== ...
B-线性代数-距离公式汇总
目录距离公式汇总一.欧式距离二.曼哈顿距离三.闵可夫斯基距离(Minkowski distance) 更新.更全的<机器学习>的更新网站,更有python.go.数据结构与算法.爬 ...
python编程基础之一
编译:将全部代码转成二进制可执行文件速度快, c,c++等解释:一行一行的将代码解释速度慢 python,php等 python简介:Guido van Rossum 1989年常用的pyth ...
CS184.1X 计算机图形学导论L3V2和L3V3（部分）
组合变换连接矩阵的优点是可以使用这些矩阵单独操作. 多个变换依然是一个矩阵. 连接矩阵不可交换,因为矩阵乘法不具有交换性. X3=RX2 X2=SX1 X3=R(SX1)=(RS)X1 X3≠SRX ...
破阵九解：Node和浏览器之事件循环/任务队列/异步顺序/数据结构
前言本文内容比较长,请见谅.如有评议,还请评论区指点,谢谢大家! >> 目录开门见山:Node和浏览器的异步执行顺序问题两种环境下的宏任务和微任务(macrotask &&a ...
JS相关实训
今天又是无聊的一天,我的脑袋一直在嗡嗡叫,想着一些奇怪的问题,比如我为什么总是感到这么失落,为什么我喜欢的女孩不喜欢我呢,真是头大啊.不过既然有作业了我这个五好公民当然要认真写了.没时间让我思考这么复 ...
引入jar包到本地仓库方法
1. 将jar放到本地某个位置,然后cmd到目标位置:2. 执行mvn install:install-file -DgroupId=com.alipay -DartifactId=alipay-tr ...
strcpy()、strncpy()和memcpy()对比
strcpy()函数声明:char *strcpy(char *dest, const char *src)返回参数:指向最终的目标字符串 dest 的指针.注意事项:只能复制char类型的字符数组, ...

tensorflow制作tfrecord格式数据

tf.Example msg

将自己的数据制作为tfrecord格式

tensorflow制作tfrecord格式数据的更多相关文章

随机推荐

热门专题