『计算机视觉』Mask-RCNN_训练网络其一:数据集与Dataset类
Github地址:Mask_RCNN
『计算机视觉』Mask-RCNN_论文学习
『计算机视觉』Mask-RCNN_项目文档翻译
『计算机视觉』Mask-RCNN_推断网络其一:总览
『计算机视觉』Mask-RCNN_推断网络其二:基于ReNet101的FPN共享网络
『计算机视觉』Mask-RCNN_推断网络其三:RPN锚框处理和Proposal生成
『计算机视觉』Mask-RCNN_推断网络其四:FPN和ROIAlign的耦合
『计算机视觉』Mask-RCNN_推断网络其五:目标检测结果精炼
『计算机视觉』Mask-RCNN_推断网络其六:Mask生成
『计算机视觉』Mask-RCNN_推断网络终篇:使用detect方法进行推断
『计算机视觉』Mask-RCNN_锚框生成
『计算机视觉』Mask-RCNN_训练网络其一:数据集与Dataset类
『计算机视觉』Mask-RCNN_训练网络其二:train网络结构&损失函数
『计算机视觉』Mask-RCNN_训练网络其三:训练Model
本节介绍的数据集class构建为官方demo,对从零开始构建自己的数据集训练感兴趣的建议了解了本文及本文对应的代码文件后,看一下『计算机视觉』Mask-RCNN_关键点检测分支介绍了由自己的数据构建Mask RCNN可用形式的实践。
代码位置
在脚本train_shapes.ipynb中,作者演示了使用合成图片进行训练Mask_RCNN的小demo,我们将以此为例,从训练数据的角度重新审视Mask_RCNN。
在训练过程中,我们最先要做的根据我们自己的数据集,集成改写基础的数据读取class:util.py中的Dataset class,然后根据数据集调整网络配置文件配置config.py中的Config 类,使得网络形状配适数,然后再去考虑训练的问题。按照逻辑流程,本节我们以train_shapes.ipynb中的数据生成为例,学习Dataset class的运作机理。
在示例程序中,首先创建新的Dataset的子类(这里贴出整个class代码,后面会分节讲解):
class ShapesDataset(utils.Dataset):
"""Generates the shapes synthetic dataset. The dataset consists of simple
shapes (triangles, squares, circles) placed randomly on a blank surface.
The images are generated on the fly. No file access required.
""" def load_shapes(self, count, height, width):
"""Generate the requested number of synthetic images.
count: number of images to generate.
height, width: the size of the generated images.
"""
# Add classes
self.add_class("shapes", 1, "square")
self.add_class("shapes", 2, "circle")
self.add_class("shapes", 3, "triangle") # Add images
# Generate random specifications of images (i.e. color and
# list of shapes sizes and locations). This is more compact than
# actual images. Images are generated on the fly in load_image().
for i in range(count):
bg_color, shapes = self.random_image(height, width)
self.add_image("shapes", image_id=i, path=None,
width=width, height=height,
bg_color=bg_color, shapes=shapes) def load_image(self, image_id):
"""Generate an image from the specs of the given image ID.
Typically this function loads the image from a file, but
in this case it generates the image on the fly from the
specs in image_info.
"""
info = self.image_info[image_id]
bg_color = np.array(info['bg_color']).reshape([1, 1, 3])
image = np.ones([info['height'], info['width'], 3], dtype=np.uint8)
image = image * bg_color.astype(np.uint8)
for shape, color, dims in info['shapes']:
image = self.draw_shape(image, shape, dims, color)
return image def image_reference(self, image_id):
"""Return the shapes data of the image."""
info = self.image_info[image_id]
if info["source"] == "shapes":
return info["shapes"]
else:
super(self.__class__).image_reference(self, image_id) def load_mask(self, image_id):
"""Generate instance masks for shapes of the given image ID.
"""
info = self.image_info[image_id]
shapes = info['shapes']
count = len(shapes)
mask = np.zeros([info['height'], info['width'], count], dtype=np.uint8)
for i, (shape, _, dims) in enumerate(info['shapes']):
mask[:, :, i:i+1] = self.draw_shape(mask[:, :, i:i+1].copy(),
shape, dims, 1)
# Handle occlusions
occlusion = np.logical_not(mask[:, :, -1]).astype(np.uint8)
for i in range(count-2, -1, -1):
mask[:, :, i] = mask[:, :, i] * occlusion
occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i]))
# Map class names to class IDs.
class_ids = np.array([self.class_names.index(s[0]) for s in shapes])
return mask.astype(np.bool), class_ids.astype(np.int32) def draw_shape(self, image, shape, dims, color):
"""Draws a shape from the given specs."""
# Get the center x, y and the size s
x, y, s = dims
if shape == 'square':
cv2.rectangle(image, (x-s, y-s), (x+s, y+s), color, -1)
elif shape == "circle":
cv2.circle(image, (x, y), s, color, -1)
elif shape == "triangle":
points = np.array([[(x, y-s),
(x-s/math.sin(math.radians(60)), y+s),
(x+s/math.sin(math.radians(60)), y+s),
]], dtype=np.int32)
cv2.fillPoly(image, points, color)
return image def random_shape(self, height, width):
"""Generates specifications of a random shape that lies within
the given height and width boundaries.
Returns a tuple of three valus:
* The shape name (square, circle, ...)
* Shape color: a tuple of 3 values, RGB.
* Shape dimensions: A tuple of values that define the shape size
and location. Differs per shape type.
"""
# Shape
shape = random.choice(["square", "circle", "triangle"])
# Color
color = tuple([random.randint(0, 255) for _ in range(3)])
# Center x, y
buffer = 20
y = random.randint(buffer, height - buffer - 1)
x = random.randint(buffer, width - buffer - 1)
# Size
s = random.randint(buffer, height//4)
return shape, color, (x, y, s) def random_image(self, height, width):
"""Creates random specifications of an image with multiple shapes.
Returns the background color of the image and a list of shape
specifications that can be used to draw the image.
"""
# Pick random background color
bg_color = np.array([random.randint(0, 255) for _ in range(3)])
# Generate a few random shapes and record their
# bounding boxes
shapes = []
boxes = []
N = random.randint(1, 4)
for _ in range(N):
shape, color, dims = self.random_shape(height, width)
shapes.append((shape, color, dims))
x, y, s = dims
boxes.append([y-s, x-s, y+s, x+s])
# Apply non-max suppression wit 0.3 threshold to avoid
# shapes covering each other
keep_ixs = utils.non_max_suppression(np.array(boxes), np.arange(N), 0.3)
shapes = [s for i, s in enumerate(shapes) if i in keep_ixs]
return bg_color, shapes
一、原始数据信息录入
然后调用如下方法(IMAGE_SHAPE=[128 128 3],介绍config时会提到),准备训练用数据和验证集数据,注意,此时仅仅是在做准备并未真实的生成或读入图片数据,
# Training dataset
dataset_train = ShapesDataset()
dataset_train.load_shapes(500, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])
dataset_train.prepare() # Validation dataset
dataset_val = ShapesDataset()
dataset_val.load_shapes(50, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])
dataset_val.prepare()
其调用的load_shapes方法如下:
def load_shapes(self, count, height, width):
"""Generate the requested number of synthetic images.
count: number of images to generate.
height, width: the size of the generated images.
"""
# Add classes
self.add_class("shapes", 1, "square")
self.add_class("shapes", 2, "circle")
self.add_class("shapes", 3, "triangle") # Add images
# Generate random specifications of images (i.e. color and
# list of shapes sizes and locations). This is more compact than
# actual images. Images are generated on the fly in load_image().
for i in range(count):
bg_color, shapes = self.random_image(height, width)
self.add_image("shapes", image_id=i, path=None,
width=width, height=height,
bg_color=bg_color, shapes=shapes)
这里涉及了两个父类继承来的方法self.add_class和self.add_image,我们去util.py中的Dataset class看一看,
class Dataset(object):
"""The base class for dataset classes.
To use it, create a new class that adds functions specific to the dataset
you want to use. For example: class CatsAndDogsDataset(Dataset):
def load_cats_and_dogs(self):
...
def load_mask(self, image_id):
...
def image_reference(self, image_id):
... See COCODataset and ShapesDataset as examples.
""" def __init__(self, class_map=None):
self._image_ids = []
self.image_info = []
# Background is always the first class
self.class_info = [{"source": "", "id": 0, "name": "BG"}]
self.source_class_ids = {} def add_class(self, source, class_id, class_name):
assert "." not in source, "Source name cannot contain a dot"
# Does the class exist already?
for info in self.class_info:
if info['source'] == source and info["id"] == class_id:
# source.class_id combination already available, skip
return
# Add the class
self.class_info.append({
"source": source,
"id": class_id,
"name": class_name,
}) def add_image(self, source, image_id, path, **kwargs):
image_info = {
"id": image_id,
"source": source,
"path": path,
}
image_info.update(kwargs)
self.image_info.append(image_info)
也就是说,在Dataset中有self.image_info 和 self.class_info 两个list,它们的元素都是固定key的字典,
"source"对应数据集名称,
"id"对应本数据集内当前图片/类别标号
"path"仅image_info含有,对应图像路径,可为None
"name"仅class_info含有,对应类别描述
在后面的prepare方法中我们可以进一步了解,使用source.id作key,可以索引到一个内建的新的internal id,这也像我们解释了为什么文档中说Mask_RCNN支持多个数据集同时训练的由来。
回到load_shapes方法,self.random_image方法为新建方法,这里作者使用算法生成图像做训练,该方法返回生成图像函数所需的随机参数,之后调用add_image时传入path为None,也是因为数据并非从磁盘读取,而是自己生成,并传入了额外的self.random_image方法返回的生成参数(我们不必关系具体参数是什么),作为字典参数解读,添加进self.image_info中,
for i in range(count):
bg_color, shapes = self.random_image(height, width)
self.add_image("shapes", image_id=i, path=None,
width=width, height=height,
bg_color=bg_color, shapes=shapes)
从这里,我们进一步了解了self.image_info的含义,记录每一张图片的id信息("source"和"id"),记录每一张图片的数据信息(如何获取图像矩阵的线索,包含"path"或者其他的字典索引,只要保证后面能实现函数,根据这个信息获取图片数据即可)。
二、数据信息整理
在初始化了 self.image_info 和 self.class_info 两个list之后,Dataset已经记录了原始的类别信息和图像信息,调用prepare方法进行规范化,
def prepare(self, class_map=None):
"""Prepares the Dataset class for use. TODO: class map is not supported yet. When done, it should handle mapping
classes from different datasets to the same class ID.
""" def clean_name(name):
"""Returns a shorter version of object names for cleaner display."""
return ",".join(name.split(",")[:1]) # Build (or rebuild) everything else from the info dicts.
self.num_classes = len(self.class_info) # 类别数目
self.class_ids = np.arange(self.num_classes) # internal 类别IDs
self.class_names = [clean_name(c["name"]) for c in self.class_info] # 类别名简洁版
self.num_images = len(self.image_info) # 图片数目
self._image_ids = np.arange(self.num_images) # internal 类别IDs # Mapping from source class and image IDs to internal IDs
self.class_from_source_map = {"{}.{}".format(info['source'], info['id']): id
for info, id in zip(self.class_info, self.class_ids)}
self.image_from_source_map = {"{}.{}".format(info['source'], info['id']): id
for info, id in zip(self.image_info, self.image_ids)} # Map sources to class_ids they support
self.sources = list(set([i['source'] for i in self.class_info]))
self.source_class_ids = {} # source对应的internal 类别IDs
# Loop over datasets
for source in self.sources:
self.source_class_ids[source] = []
# Find classes that belong to this dataset
for i, info in enumerate(self.class_info):
# Include BG class in all datasets
if i == 0 or source == info['source']:
self.source_class_ids[source].append(i)
类别信息记录
将"source.id"映射为唯一的internal IDs,并将全部的internal IDs存储在self.class_ids
source_class_ids,记录下每一个"source"对应的internal IDs
class_from_source_map,记录下"source.id":internal IDs的映射关系
print(dataset_train.class_info) # 每个类别原始信息
print(dataset_train.class_ids) # 记录类别internal IDs
print(dataset_train.source_class_ids) # 每个数据集对应的internal IDs
print(dataset_train.class_from_source_map) # 原始信息和internal ID映射关系
输出如下:
[{'source': '', 'id': 0, 'name': 'BG'},
{'source': 'shapes', 'id': 1, 'name': 'square'},
{'source': 'shapes', 'id': 2, 'name': 'circle'},
{'source': 'shapes', 'id': 3, 'name': 'triangle'}]
[0 1 2 3]
{'': [0], 'shapes': [0, 1, 2, 3]}
{'.0': 0, 'shapes.1': 1, 'shapes.2': 2, 'shapes.3': 3}
有固定的source为空的类别0(id和internal ID都是),标记为背景,会添加进source_class_ids中全部的数据集对应的类别中(上面"shape"数据集我们仅定义了3个类,在映射中多了一个0变成4个类)。
图片信息记录
图片信息不像类别一样麻烦,我们简单输出三张,
# Training dataset
dataset_train = ShapesDataset()
dataset_train.load_shapes(3, config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1])
dataset_train.prepare() print(dataset_train.image_info) # 记录图像原始信息
print(dataset_train.image_ids) # 记录图像internal IDs
print(dataset_train.image_from_source_map) # 原始信息和internal ID对应关系
结果如下,
[{'id': 0, 'source': 'shapes', 'path': None, 'width': 128, 'height': 128, 'bg_color': array([163, 143, 173]),
'shapes': [('circle', (178, 140, 65), (83, 104, 20)), ('circle', (192, 52, 82), (48, 58, 20))]},
{'id': 1, 'source': 'shapes', 'path': None, 'width': 128, 'height': 128, 'bg_color': array([ 5, 99, 71]),
'shapes': [('triangle', (90, 32, 55), (39, 21, 22)), ('circle', (214, 49, 173), (39, 78, 21))]},
{'id': 2, 'source': 'shapes', 'path': None, 'width': 128, 'height': 128, 'bg_color': array([138, 52, 83]),
'shapes': [('circle', (180, 74, 150), (105, 45, 27))]}]
[0 1 2]
{'shapes.0': 0, 'shapes.1': 1, 'shapes.2': 2}
【注1】由于这是图像检测任务而非图像分类任务,故每张图片仅仅和归属数据集存在映射,和类别信息没有直接映射。图像上的目标和类别才存在映射关系,不过那不在本部分函数涉及范围内。
【注2】internal IDs实际上就是info的索引数组,使用internal IDs的值可以直接索引对应图片顺序的info信息。
总结,在调用self.prepare之前,通过自己的新建方法调用self.add_class()和self.add_image(),将图片和分类的原始信息以dict的形式添加到class_info与image_info两个list中,即可。
三、获取图片
然后我们获取一些样例图片进行展示,
# Load and display random samples
image_ids = np.random.choice(dataset_train.image_ids, 4)
for image_id in image_ids:
image = dataset_train.load_image(image_id)
mask, class_ids = dataset_train.load_mask(image_id)
visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)
print(image.shape, mask.shape, class_ids, dataset_train.class_names)
由上面代码我们可以获悉如下信息:
使用self.image.ids即internal IDs进行图片选取
自行实现load_image方法,获取图片internal IDs,索引图片原始信息(info),利用原始信息输出图片
自行实现load_mask方法,获取图片internal IDs,索引图片原始信息(info),利用原始信息输出图片的masks和对应internal类别,注意一张图片可以有多个mask并分别对应自己的类别
上述代码输出如下(仅展示前两张),
下面贴出load_image和load_mask方法(详见train_shapes.ipynb),具体实现不是重点,毕竟我们也不是在研究怎么画2D图,重点在于上面提到的它们的功能,这涉及到我们迁移到自己的数据时如何实现接口。load_image方法返回一张图片,load_mask方法返回(h,w,c)的01掩码以及(c,)的class id,注意,c指的是盖章图片中instance的数目
def load_image(self, image_id):
"""Generate an image from the specs of the given image ID.
Typically this function loads the image from a file, but
in this case it generates the image on the fly from the
specs in image_info.
"""
info = self.image_info[image_id]
bg_color = np.array(info['bg_color']).reshape([1, 1, 3])
image = np.ones([info['height'], info['width'], 3], dtype=np.uint8)
image = image * bg_color.astype(np.uint8)
for shape, color, dims in info['shapes']:
image = self.draw_shape(image, shape, dims, color)
return image def load_mask(self, image_id):
"""Generate instance masks for shapes of the given image ID.
"""
info = self.image_info[image_id]
shapes = info['shapes']
count = len(shapes)
mask = np.zeros([info['height'], info['width'], count], dtype=np.uint8)
for i, (shape, _, dims) in enumerate(info['shapes']):
mask[:, :, i:i+1] = self.draw_shape(mask[:, :, i:i+1].copy(),
shape, dims, 1)
# Handle occlusions
occlusion = np.logical_not(mask[:, :, -1]).astype(np.uint8)
for i in range(count-2, -1, -1):
mask[:, :, i] = mask[:, :, i] * occlusion
occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i]))
# Map class names to class IDs.
class_ids = np.array([self.class_names.index(s[0]) for s in shapes])
return mask.astype(np.bool), class_ids.astype(np.int32)
小结
正如Dataset注释所说,要想运行自己的数据集,我们首先要实现一个方法(load_shapes,根据数据集取名即可)收集原始图像、类别信息,然后实现两个方法(load_image、load_mask)分别实现获取单张图片数据、获取单张图片对应的objs的masks和classes,这样基本完成了数据集类的构建。
The base class for dataset classes.
To use it, create a new class that adds functions specific to the dataset
you want to use. For example: class CatsAndDogsDataset(Dataset):
def load_cats_and_dogs(self):
...
def load_mask(self, image_id):
...
def image_reference(self, image_id):
... See COCODataset and ShapesDataset as examples.
『计算机视觉』Mask-RCNN_训练网络其一:数据集与Dataset类的更多相关文章
- 『计算机视觉』经典RCNN_其二:Faster-RCNN
项目源码 一.Faster-RCNN简介 『cs231n』Faster_RCNN 『计算机视觉』Faster-RCNN学习_其一:目标检测及RCNN谱系 一篇讲的非常明白的文章:一文读懂Faster ...
- 『计算机视觉』经典RCNN_其一:从RCNN到Faster-RCNN
RCNN介绍 目标检测-RCNN系列 一文读懂Faster RCNN 一.目标检测 1.两个任务 目标检测可以拆分成两个任务:识别和定位 图像识别(classification)输入:图片输出:物体的 ...
- 『计算机视觉』Mask-RCNN_推断网络其二:基于ReNet101的FPN共享网络暨TensorFlow和Keras交互简介
零.参考资料 有关FPN的介绍见『计算机视觉』FPN特征金字塔网络. 网络构架部分代码见Mask_RCNN/mrcnn/model.py中class MaskRCNN的build方法的"in ...
- 『计算机视觉』Mask-RCNN_推断网络其四:FPN和ROIAlign的耦合
一.模块概述 上节的最后,我们进行了如下操作获取了有限的proposal, # [IMAGES_PER_GPU, num_rois, (y1, x1, y2, x2)] # IMAGES_PER_GP ...
- 『计算机视觉』Mask-RCNN
一.Mask-RCNN流程 Mask R-CNN是一个实例分割(Instance segmentation)算法,通过增加不同的分支,可以完成目标分类.目标检测.语义分割.实例分割.人体姿势识别等多种 ...
- 『计算机视觉』Mask-RCNN_训练网络其三:训练Model
Github地址:Mask_RCNN 『计算机视觉』Mask-RCNN_论文学习 『计算机视觉』Mask-RCNN_项目文档翻译 『计算机视觉』Mask-RCNN_推断网络其一:总览 『计算机视觉』M ...
- 『计算机视觉』Mask-RCNN_训练网络其二:train网络结构&损失函数
Github地址:Mask_RCNN 『计算机视觉』Mask-RCNN_论文学习 『计算机视觉』Mask-RCNN_项目文档翻译 『计算机视觉』Mask-RCNN_推断网络其一:总览 『计算机视觉』M ...
- 『计算机视觉』Mask-RCNN_推断网络其六:Mask生成
一.Mask生成概览 上一节的末尾,我们已经获取了待检测图片的分类回归信息,我们将回归信息(即待检测目标的边框信息)单独提取出来,结合金字塔特征mrcnn_feature_maps,进行Mask生成工 ...
- 『计算机视觉』Mask-RCNN_推断网络终篇:使用detect方法进行推断
一.detect和build 前面多节中我们花了大量笔墨介绍build方法的inference分支,这节我们看看它是如何被调用的. 在dimo.ipynb中,涉及model的操作我们简单进行一下汇总, ...
随机推荐
- C++ 11 多线程下std::unique_lock与std::lock_guard的区别和用法
这里主要介绍std::unique_lock与std::lock_guard的区别用法 先说简单的 一.std::lock_guard的用法 std::lock_guard其实就是简单的RAII封装, ...
- python安装simplejson
import simplejson 报错:ImportError: No module named simplejson simplejson是ansible一个很重要的依赖,经测试在python 2 ...
- cocos2d-x JS 定时器暂停方法
this.scheduleOnce(function(){ this.addChild(Menugobtn);//要暂停执行的代码 }, 10);
- cocos2d-x 贡献一个oss上传脚本
平常写前端项目和H5游戏时特别频繁的一个操作就是上传到oss上,特别浪费时间.所以用ali-oss写了一个脚本.配置属性后直接npm run oss就能上传到oss上了.再也不需要手动操作.现在是脚本 ...
- [ Build Tools ] Repositories
仓库介绍 http://hao.jobbole.com/central-repository/ https://my.oschina.net/pingjiangyetan/blog/423380 ht ...
- 【运维技术】redis(一主两从三哨兵模式搭建)记录
redis(一主两从三哨兵模式搭建)记录 目的: 让看看这篇文章的的人能够知道:软件架构.软件的安装.配置.基本运维的操作.高可用测试.也包含我自己,能够节省对应的时间. 软件架构: 生产环境使用三台 ...
- canutils上板测试问题记录
ltp-ddt运行can_loopback时出错 pan(1881): Must supply a file collection or a command 原因runtest/ddt/can_loo ...
- Solr和Lucene的区别?
1.Lucene 是工具包 是jar包 2.Solr是索引引擎服务 War 3.Solr是基于Lucene(底层是由Lucene写的) 4.上面二个软件都是Apache公司由java写的 5.Luc ...
- 什么是DNS攻击?它是如何工作的?
什么是DNS攻击?它是如何工作的? DNS攻击是一种利用域名系统中的弱点或漏洞的网络攻击.今天,互联网已成为我们生活中不可或缺的一部分.从社交到金融.购物再到旅游,我们生活的方方面面都是互联网.由于互 ...
- Weighted Quick Union with Path Compression (WQUPC)
在WQU基础上,添加一步路径压缩. 前面的优化都是在union,路径压缩是在find上面做文章. 这里的路径压缩我还没完全搞明白,之后不断再来的,不管是理解还是博文编排素材之类的. 说是加一步压缩是确 ...