Sampler

首先需要知道的是所有的采样器都继承自Sampler这个类，如下：

可以看到主要有三种方法：分别是：

__init__: 这个很好理解，就是初始化
__iter__: 这个是用来产生迭代索引值的，也就是指定每个step需要读取哪些数据
__len__: 这个是用来返回每次迭代器的长度

class Sampler(object):

    r"""Base class for all Samplers.

    Every Sampler subclass has to provide an __iter__ method, providing a way

    to iterate over indices of dataset elements, and a __len__ method that

    returns the length of the returned iterators.

    """

    # 一个 迭代器 基类

    def __init__(self, data_source):

        pass

    def __iter__(self):

        raise NotImplementedError

    def __len__(self):

        raise NotImplementedError

子类Sampler

介绍完父类后我们看看Pytorch给我们提供了哪些采样器

SequentialSampler

这个看名字就很好理解，其实就是按顺序对数据集采样。

其原理是首先在初始化的时候拿到数据集data_source，之后在__iter__方法中首先得到一个和data_source一样长度的range可迭代器。每次只会返回一个索引值。

class SequentialSampler(Sampler):

    r"""Samples elements sequentially, always in the same order.

    Arguments:

        data_source (Dataset): dataset to sample from

    """

   # 产生顺序 迭代器

    def __init__(self, data_source):

        self.data_source = data_source

    def __iter__(self):

        return iter(range(len(self.data_source)))

    def __len__(self):

        return len(self.data_source)

使用示例：

a = [1,5,78,9,68]

b = torch.utils.data.SequentialSampler(a)

for x in b:

	print(x)

>>> 0

	1

	2

	3

	4

RandomSampler

参数作用：

data_source: 同上
num_samples: 指定采样的数量，默认是所有。
replacement: 若为True，则表示可以重复采样，即同一个样本可以重复采样，这样可能导致有的样本采样不到。所以此时我们可以设置num_samples来增加采样数量使得每个样本都可能被采样到。

class RandomSampler(Sampler):

    r"""Samples elements randomly. If without replacement, then sample from a shuffled dataset.

    If with replacement, then user can specify ``num_samples`` to draw.

    Arguments:

        data_source (Dataset): dataset to sample from

        num_samples (int): number of samples to draw, default=len(dataset)

        replacement (bool): samples are drawn with replacement if ``True``, default=False

    """

    def __init__(self, data_source, replacement=False, num_samples=None):

        self.data_source = data_source

        self.replacement = replacement

        self.num_samples = num_samples

        if self.num_samples is not None and replacement is False:

            raise ValueError("With replacement=False, num_samples should not be specified, "

                             "since a random permute will be performed.")

        if self.num_samples is None:

            self.num_samples = len(self.data_source)

        if not isinstance(self.num_samples, int) or self.num_samples <= 0:

            raise ValueError("num_samples should be a positive integeral "

                             "value, but got num_samples={}".format(self.num_samples))

        if not isinstance(self.replacement, bool):

            raise ValueError("replacement should be a boolean value, but got "

                             "replacement={}".format(self.replacement))

    def __iter__(self):

        n = len(self.data_source)

        if self.replacement:

            return iter(torch.randint(high=n, size=(self.num_samples,), dtype=torch.int64).tolist())

        return iter(torch.randperm(n).tolist())

    def __len__(self):

        return len(self.data_source)

SubsetRandomSampler

class SubsetRandomSampler(Sampler):

    r"""Samples elements randomly from a given list of indices, without replacement.

    Arguments:

        indices (sequence): a sequence of indices

    """

    def __init__(self, indices):

        self.indices = indices

    def __iter__(self):

        return (self.indices[i] for i in torch.randperm(len(self.indices)))

    def __len__(self):

        return len(self.indices)

这个采样器常见的使用场景是将训练集划分成训练集和验证集，示例如下：

n_train = len(train_dataset)

split = n_train // 3

indices = random.shuffle(list(range(n_train)))

train_sampler = torch.utils.data.sampler.SubsetRandomSampler(indices[split:])

valid_sampler = torch.utils.data.sampler.SubsetRandomSampler(indices[:split])

train_loader = DataLoader(..., sampler=train_sampler, ...)

valid_loader = DataLoader(..., sampler=valid_sampler, ...)

WeightedRandomSampler

参数作用同上面的RandomSampler，不再赘述。



class WeightedRandomSampler(Sampler):

    r"""Samples elements from [0,..,len(weights)-1] with given probabilities (weights).

    Arguments:

        weights (sequence)   : a sequence of weights, not necessary summing up to one

        num_samples (int): number of samples to draw

        replacement (bool): if ``True``, samples are drawn with replacement.

            If not, they are drawn without replacement, which means that when a

            sample index is drawn for a row, it cannot be drawn again for that row.

    """

    def __init__(self, weights, num_samples, replacement=True):

        if not isinstance(num_samples, _int_classes) or isinstance(num_samples, bool) or \

                num_samples <= 0:

            raise ValueError("num_samples should be a positive integeral "

                             "value, but got num_samples={}".format(num_samples))

        if not isinstance(replacement, bool):

            raise ValueError("replacement should be a boolean value, but got "

                             "replacement={}".format(replacement))

        self.weights = torch.tensor(weights, dtype=torch.double)

        self.num_samples = num_samples

        self.replacement = replacement

    def __iter__(self):

        return iter(torch.multinomial(self.weights, self.num_samples, self.replacement).tolist())

    def __len__(self):

        return self.num_samples  ## 指的是一次一共采样的样本的数量

BatchSampler

前面的采样器每次都只返回一个索引，但是我们在训练时是对批量的数据进行训练，而这个工作就需要BatchSampler来做。也就是说BatchSampler的作用就是将前面的Sampler采样得到的索引值进行合并，当数量等于一个batch大小后就将这一批的索引值返回。

class BatchSampler(Sampler):

    r"""Wraps another sampler to yield a mini-batch of indices.

    Args:

        sampler (Sampler): Base sampler.

        batch_size (int): Size of mini-batch.

        drop_last (bool): If ``True``, the sampler will drop the last batch if

            its size would be less than ``batch_size``

    Example:

        >>> list(BatchSampler(SequentialSampler(range(10)), batch_size=3, drop_last=False))

        [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

        >>> list(BatchSampler(SequentialSampler(range(10)), batch_size=3, drop_last=True))

        [[0, 1, 2], [3, 4, 5], [6, 7, 8]]

    """

# 批次采样

    def __init__(self, sampler, batch_size, drop_last):

        if not isinstance(sampler, Sampler):

            raise ValueError("sampler should be an instance of "

                             "torch.utils.data.Sampler, but got sampler={}"

                             .format(sampler))

        if not isinstance(batch_size, _int_classes) or isinstance(batch_size, bool) or \

                batch_size <= 0:

            raise ValueError("batch_size should be a positive integeral value, "

                             "but got batch_size={}".format(batch_size))

        if not isinstance(drop_last, bool):

            raise ValueError("drop_last should be a boolean value, but got "

                             "drop_last={}".format(drop_last))

        self.sampler = sampler

        self.batch_size = batch_size

        self.drop_last = drop_last

    def __iter__(self):

        batch = []

        for idx in self.sampler:

            batch.append(idx)

            if len(batch) == self.batch_size:

                yield batch

                batch = []

        if len(batch) > 0 and not self.drop_last:

            yield batch

    def __len__(self):

        if self.drop_last:

            return len(self.sampler) // self.batch_size

        else:

            return (len(self.sampler) + self.batch_size - 1) // self.batch_size

Pytorch Sampler详解的更多相关文章

目标检测之Faster-RCNN的pytorch代码详解(数据预处理篇)
首先贴上代码原作者的github:https://github.com/chenyuntc/simple-faster-rcnn-pytorch(非代码作者,博文只解释代码) 今天看完了simple- ...
目标检测之Faster-RCNN的pytorch代码详解(模型训练篇)
本文所用代码gayhub的地址:https://github.com/chenyuntc/simple-faster-rcnn-pytorch (非本人所写,博文只是解释代码) 好长时间没有发博客了 ...
目标检测之Faster-RCNN的pytorch代码详解(模型准备篇)
十月一的假期转眼就结束了,这个假期带女朋友到处玩了玩,虽然经济仿佛要陷入危机,不过没关系,要是吃不上饭就看书,吃精神粮食也不错,哈哈!开个玩笑,是要收收心好好干活了,继续写Faster-RCNN的代码 ...
Pytorch autograd,backward详解
平常都是无脑使用backward,每次看到别人的代码里使用诸如autograd.grad这种方法的时候就有点抵触,今天花了点时间了解了一下原理,写下笔记以供以后参考.以下笔记基于Pytorch1.0 ...
pytorch之nn.Conv1d详解
转自:https://blog.csdn.net/sunny_xsc1994/article/details/82969867,感谢分享 pytorch之nn.Conv1d详解
[转载]Pytorch详解NLLLoss和CrossEntropyLoss
[转载]Pytorch详解NLLLoss和CrossEntropyLoss 来源:https://blog.csdn.net/qq_22210253/article/details/85229988 ...
【小白学PyTorch】11 MobileNet详解及PyTorch实现
文章来自微信公众号[机器学习炼丹术].我是炼丹兄,欢迎加我微信好友交流学习:cyx645016617. @ 目录 1 背景 2 深度可分离卷积 2.2 一般卷积计算量 2.2 深度可分离卷积计算量 2 ...
【小白学PyTorch】21 Keras的API详解（上）卷积、激活、初始化、正则
[新闻]:机器学习炼丹术的粉丝的人工智能交流群已经建立,目前有目标检测.医学图像.时间序列等多个目标为技术学习的分群和水群唠嗑答疑解惑的总群,欢迎大家加炼丹兄为好友,加入炼丹协会.微信:cyx6450 ...
PyTorch常用参数初始化方法详解
1. 均匀分布 torch.nn.init.uniform_(tensor, a=0, b=1) 从均匀分布U(a, b)中采样,初始化张量. 参数: tensor - 需要填充的张量 a - 均匀分 ...

随机推荐

java web开发入门四（spring）基于intellig idea
spring 1.spring简介 Spring框架,可以解决对象创建以及对象之间依赖关系的一种框架. 且可以和其他框架一起使用:Spring与Struts, Spring与hibernate (起 ...
面向对象的理解抽象类&接口
一.关于面向对象 1.什么是面向对象在解释面向对象之前,先说说面向过程.学过C的同学都知道,C就是面向过程的一种语言.那什么是面向过程呢?比方说组装主机,对于面向过程,需要从0开始.买cpu ...
springboot项目打包成jar/war包
springboot项目打包过程中包含第三方jar 开发IDE是IntelliJ IDEA,数据库是mysql,内置服务器tomcat. 打包步骤: 1. 确定项目调试运行没问题 2. 将第三方jar ...
四元数， Physx中的四元数
四元数的概念 & 如何使用四元数: 绕V轴旋转 f 角,对应的四元数: q = ( cos(f/2), Vx*sin(f/2), Vy*sin(f/2), Vz*sin(f/2) ) = c ...
[转帖]spring boot项目集成jacoco
小试牛刀:spring boot项目集成jacoco 2019-03-28 20:14:36 zyq23333 阅读数 509 版权声明:本文为博主原创文章,遵循CC 4.0 BY-SA版权协议, ...
Oracel 数据库表操作
表结构操作创建表 create table tableName (id varchar2(36) primary key, name varchar2(36), age number(12,2), ...
彻底搞懂Javascript的this
在Javascript中,最玄妙的特性之一,就是this的指向玄幻莫测,一会儿指向这一会儿指向那,让初学者十分伤脑筋. 本文总结一下,方便初学者掌握奥妙之处,同时方便老鸟温故而知新. 首先,看一段代码 ...
.net core将URL请求格式化为XML或JSON(网站动态生成sitemap.xml)
.net core将URL请求格式化为XML或JSON(网站动态生成sitemap.xml) 首先设置 Startup.cs 文件配置 ConfigureServices services .Add ...
prometheus被OOM杀死
参考https://www-01.ibm.com/support/docview.wss?uid=ibm10882172 提升内存配置 2.5版本后新增--query.max-samples 这个参数 ...
基于react-app搭建react-router+redux项目
前言总括: 本文采用react+redux+react-router+less+es6+webpack,以实现一个简易备忘录(todolist)为例尽可能全面的讲述使用react全家桶实现一个完整应 ...

Pytorch Sampler详解