tensorflow中数据批次划分示例教程

1.简介

将数据划分成若干批次的数据，可以使用tf.train或者tf.data.Dataset中的方法。

1.1 tf.train

tf.train.slice_input_producer(tensor_list,shuffle=True,seed=None,capacity=32)

tf.train.batch(tensors,batch_size,num_threads=1,capacity=32,allow_smaller_final_batch=False)

参数说明：

shuffle：为True时进行数据清洗

allow_smaller_final_batch：为True时将小于batch_size的批次值输出

-------------------------------------------------------------------------------------------------------------------------

1.2 tf.data.Dataset

tf.data.Dataset是一个类，可以使用以下方法：

from_tensor_slices(tensors)

batch(batch_size,drop_remainder=False)

shuffle(buffer_size,seed=None,reshuffle_each_iteration=None)

repeat(count=None)

make_one_shot_iterator() / get_next()

注：make_one_shot_iterator() / get_next()用于Dataset数据的迭代器

参数说明：

tensors：可以是列表、字典、元组等类型

drop_remainder：为False时表示不保留小于batch_size的批次，否则删除

buffer_size：数据清洗时使用的buffer大小

count：对应为epoch个数，为None时表示数据序列无限延续

2.示例

2.1 使用tf.train.slice_input_producer和tf.train.batch

 import tensorflow as tf

 import numpy as np

 import math

 # 生成样例数据集

 def generate_data():

     num = 15

     labels = np.asarray(range(num))

     images = np.random.random([num, 5, 5, 3])

     return images, labels

 # 打印样例信息

 images, labels = generate_data()

 print('images.shape={0}, labels.shape={1}'.format(images.shape, labels.shape))

 # 定义周期、批次、数据总量和遍历一次所有数据所需的迭代次数

 n_epochs = 3

 batch_size = 6

 train_nums = 15

 iterations = math.ceil(train_nums/batch_size)

 # 使用tf.train.slice_input_producer将所有数据放入队列，使用tf.train.batch划分队列中的数据

 input_queue = tf.train.slice_input_producer([images, labels], shuffle=False)

 image_batch, label_batch = tf.train.batch(input_queue, batch_size=batch_size, num_threads=1, capacity=32)

 print('image_batch.shape={0}, label_batch.shape={1}'.format(image_batch.shape, label_batch.shape))

 with tf.Session() as sess:

     tf.global_variables_initializer().run()

     # 启动队列线程

     coord = tf.train.Coordinator()

     threads = tf.train.start_queue_runners(sess, coord)

     # 打印信息

     for epoch in range(n_epochs):

         for iteration in range(iterations):

             cu_image_batch, cu_label_batch = sess.run([image_batch, label_batch])

             print('The {0} epoch, the {1} iteration, current batch is {2}'.format(epoch+1,iteration+1,cu_label_batch))

     # 接收线程

     coord.request_stop()

     coord.join(threads)    

 # 打印结果如下

 images.shape=(15, 5, 5, 3), labels.shape=(15,)

 image_batch.shape=(6, 5, 5, 3), label_batch.shape=(6,)

 The 1 epoch, the 1 iteration, current batch is [0 1 2 3 4 5]

 The 1 epoch, the 2 iteration, current batch is [ 6  7  8  9 10 11]

 The 1 epoch, the 3 iteration, current batch is [12 13 14  0  1  2]

 The 2 epoch, the 1 iteration, current batch is [3 4 5 6 7 8]

 The 2 epoch, the 2 iteration, current batch is [ 9 10 11 12 13 14]

 The 2 epoch, the 3 iteration, current batch is [0 1 2 3 4 5]

 The 3 epoch, the 1 iteration, current batch is [ 6  7  8  9 10 11]

 The 3 epoch, the 2 iteration, current batch is [12 13 14  0  1  2]

 The 3 epoch, the 3 iteration, current batch is [3 4 5 6 7 8]

如果tf.train.slice_input_producer(shuffle=True)，输出为乱序，结果如下：

 images.shape=(15, 5, 5, 3), labels.shape=(15,)

 image_batch.shape=(6, 5, 5, 3), label_batch.shape=(6,)

 The 1 epoch, the 1 iteration, current batch is [ 2  5  8 11  3 10]

 The 1 epoch, the 2 iteration, current batch is [ 9 12  7  1 14 13]

 The 1 epoch, the 3 iteration, current batch is [0 6 4 2 3 6]

 The 2 epoch, the 1 iteration, current batch is [11 10 12 14 13  5]

 The 2 epoch, the 2 iteration, current batch is [8 1 0 9 4 7]

 The 2 epoch, the 3 iteration, current batch is [10 13  1  4 12  3]

 The 3 epoch, the 1 iteration, current batch is [ 2  8  5  9 14  7]

 The 3 epoch, the 2 iteration, current batch is [ 0 11  6  1 14  9]

 The 3 epoch, the 3 iteration, current batch is [11  6 12  7  0 13]

如果tf.train.batch(allow_smaller_final_batch=True)，则会返回不足批次数目的数据，结果如下：

 images.shape=(15, 5, 5, 3), labels.shape=(15,)

 The 1 epoch, the 1 iteration, current batch is [0 1 2 3 4 5]

 The 1 epoch, the 2 iteration, current batch is [ 6  7  8  9 10 11]

 The 1 epoch, the 3 iteration, current batch is [12 13 14]

 The 2 epoch, the 1 iteration, current batch is [0 1 2 3 4 5]

 The 2 epoch, the 2 iteration, current batch is [ 6  7  8  9 10 11]

 The 2 epoch, the 3 iteration, current batch is [12 13 14]

 The 3 epoch, the 1 iteration, current batch is [0 1 2 3 4 5]

 The 3 epoch, the 2 iteration, current batch is [ 6  7  8  9 10 11]

 The 3 epoch, the 3 iteration, current batch is [12 13 14]

2.2 使用tf.data.Dataset类

 import tensorflow as tf

 import numpy as np

 import math

 # 生成样例数据集

 def generate_data():

     num = 15

     labels = np.asarray(range(num))

     images = np.random.random([num, 5, 5, 3])

     return images, labels

 # 打印样例信息

 images, labels = generate_data()

 print('images.shape={0}, labels.shape={1}'.format(images.shape, labels.shape))

 # 定义周期、批次、数据总数、遍历一次所有数据需的迭代次数

 n_epochs = 3

 batch_size = 6

 train_nums = 15

 iterations = math.ceil(train_nums/batch_size)

 # 使用from_tensor_slices将数据放入队列，使用batch和repeat划分数据批次，且让数据序列无限延续

 dataset = tf.data.Dataset.from_tensor_slices((images, labels))

 dataset = dataset.batch(batch_size).repeat()



 # 使用生成器make_one_shot_iterator和get_next取数据

 iterator = dataset.make_one_shot_iterator()

 next_iterator = iterator.get_next()



 with tf.Session() as sess:

     for epoch in range(n_epochs):

         for iteration in range(iterations):

             cu_image_batch, cu_label_batch = sess.run(next_iterator)

             print('The {0} epoch, the {1} iteration, current batch is {2}'.format(epoch+1,iteration+1,cu_label_batch))

 # 结果如下：

 images.shape=(15, 5, 5, 3), labels.shape=(15,)

 The 1 epoch, the 1 iteration, current batch is [0 1 2 3 4 5]

 The 1 epoch, the 2 iteration, current batch is [ 6  7  8  9 10 11]

 The 1 epoch, the 3 iteration, current batch is [12 13 14]

 The 2 epoch, the 1 iteration, current batch is [0 1 2 3 4 5]

 The 2 epoch, the 2 iteration, current batch is [ 6  7  8  9 10 11]

 The 2 epoch, the 3 iteration, current batch is [12 13 14]

 The 3 epoch, the 1 iteration, current batch is [0 1 2 3 4 5]

 The 3 epoch, the 2 iteration, current batch is [ 6  7  8  9 10 11]

 The 3 epoch, the 3 iteration, current batch is [12 13 14]

使用shuffle()，第23行修改为dataset = dataset.shuffle(100).batch(batch_size).repeat()，结果如下：

 images.shape=(15, 5, 5, 3), labels.shape=(15,)

 The 1 epoch, the 1 iteration, current batch is [ 7  4 10  8  3 11]

 The 1 epoch, the 2 iteration, current batch is [ 0  2 12 13 14  5]

 The 1 epoch, the 3 iteration, current batch is [6 9 1]

 The 2 epoch, the 1 iteration, current batch is [ 6 14  7  9  3  8]

 The 2 epoch, the 2 iteration, current batch is [13  5 12  1 11  2]

 The 2 epoch, the 3 iteration, current batch is [ 0  4 10]

 The 3 epoch, the 1 iteration, current batch is [10  8 13 12  3 14]

 The 3 epoch, the 2 iteration, current batch is [ 6  9  2  5  1 11]

 The 3 epoch, the 3 iteration, current batch is [0 4 7]

！！！

tensorflow中数据批次划分示例教程的更多相关文章

TensorFlow中数据读取之tfrecords
关于Tensorflow读取数据,官网给出了三种方法: 供给数据(Feeding): 在TensorFlow程序运行的每一步, 让Python代码来供给数据. 从文件读取数据: 在TensorFlow ...
大数据下基于Tensorflow框架的深度学习示例教程
近几年,信息时代的快速发展产生了海量数据,诞生了无数前沿的大数据技术与应用.在当今大数据时代的产业界,商业决策日益基于数据的分析作出.当数据膨胀到一定规模时,基于机器学习对海量复杂数据的分析更能产生较 ...
TensorFlow中数据读取—如何载入样本
考虑到要是自己去做一个项目,那么第一步是如何把数据导入到代码中,何种形式呢?是否需要做预处理?官网中给的实例mnist,数据导入都是写好的模块,那么自己的数据呢? 一.从文件中读取数据(CSV文件.二 ...
.NET 5/.NET Core使用EF Core 5连接MySQL数据库写入/读取数据示例教程
本文首发于<.NET 5/.NET Core使用EF Core 5(Entity Framework Core)连接MySQL数据库写入/读取数据示例教程> 前言在.NET Core/. ...
[开发技巧]·TensorFlow中numpy与tensor数据相互转化
[开发技巧]·TensorFlow中numpy与tensor数据相互转化个人主页–> https://xiaosongshine.github.io/ - 问题描述在我们使用TensorFl ...
python操作txt文件中数据教程[4]-python去掉txt文件行尾换行
python操作txt文件中数据教程[4]-python去掉txt文件行尾换行觉得有用的话,欢迎一起讨论相互学习~Follow Me 参考文章 python操作txt文件中数据教程[1]-使用pyt ...
python操作txt文件中数据教程[3]-python读取文件夹中所有txt文件并将数据转为csv文件
python操作txt文件中数据教程[3]-python读取文件夹中所有txt文件并将数据转为csv文件觉得有用的话,欢迎一起讨论相互学习~Follow Me 参考文献 python操作txt文件中 ...
python操作txt文件中数据教程[2]-python提取txt文件
python操作txt文件中数据教程[2]-python提取txt文件中的行列元素觉得有用的话,欢迎一起讨论相互学习~Follow Me 原始txt文件程序实现后结果-将txt中元素提取并保存在c ...
python操作txt文件中数据教程[1]-使用python读写txt文件
python操作txt文件中数据教程[1]-使用python读写txt文件觉得有用的话,欢迎一起讨论相互学习~Follow Me 原始txt文件程序实现后结果程序实现 filename = '. ...

随机推荐

svn 目录
svn介绍 SVN与Git的区别 SVN服务的模式和多种访问方式多种访问原理图解与优缺点 SVN安装部署 svn 部署配置配置svn用户及密码配置svn用户及权限 svn 启动命令讲解 svn ...
slf4j日志使用
scala中 trait LogSupport { protected val log = LoggerFactory.getLogger(this.getClass) } 需要要到的类 extend ...
PHP中new self()和new static()的区别探究
1.new static()是在PHP5.3版本中引入的新特性. 2.无论是new static()还是new self(),都是new了一个新的对象. 3.这两个方法new出来的对象有什么区别呢,说 ...
java中加与不加public
加public表示全局类,该类可以import到任何类内.不加public默认为保留类,只能被同一个包内的其他类引用来源:https://blog.csdn.net/qq_15037231/artic ...
[ Python ] unittest demo
# -*- coding: utf-8 -*- import unittest class MyUT(unittest.TestCase): def test_1(self): print(" ...
day07 Python文件操作
一,文件操作基本流程 #1. 打开文件,得到文件句柄并赋值给一个变量 f=open('a.txt','r',encoding='utf-8') #默认打开模式就为r #2. 通过句柄对文件进行操作 d ...
普通文件的上传（表单上传和ajax文件异步上传）
一.表单上传: html客户端部分: <form action="upload.ashx" method="post" enctype="mul ...
QComboBox列表项高度设置
QComboBox列表项高度设置步骤: 1. 设置代理 QStyledItemDelegate *delegate = new QStyledItemDelegate(this); ui->co ...
Solr和Lucene的区别?
1.Lucene 是工具包是jar包 2.Solr是索引引擎服务 War 3.Solr是基于Lucene(底层是由Lucene写的) 4.上面二个软件都是Apache公司由java写的 5.Luc ...
P4777 【模板】扩展中国剩余定理（EXCRT）/ poj2891 Strange Way to Express Integers
P4777 [模板]扩展中国剩余定理(EXCRT) excrt模板我们知道,crt无法处理模数不两两互质的情况然鹅excrt可以设当前解到第 i 个方程设$M=\prod_{j=1}^{i-1 ...

tensorflow中数据批次划分示例教程

tensorflow中数据批次划分示例教程的更多相关文章

随机推荐

热门专题