图像数据增强 (Data Augmentation in Computer Vision)

1.1 简介

深层神经网络一般都需要大量的训练数据才能获得比较理想的结果。在数据量有限的情况下，可以通过数据增强（Data Augmentation）来增加训练样本的多样性，提高模型鲁棒性，避免过拟合。

在计算机视觉中，典型的数据增强方法有翻转（Flip），旋转（Rotat ），缩放（Scale），随机裁剪或补零（Random Crop or Pad），色彩抖动（Color jittering），加噪声（Noise）

笔者在跟进视频及图像中的人体姿态检测和关键点追踪（Human Pose Estimatiion and Tracking in videos）的项目。因此本文的数据增强仅使用——翻转（Flip），旋转（Rotate ），缩放以及缩放（Scale）

2.1 裁剪（Crop）

image.shape--([3, width, height])一个视频序列中的一帧图片,裁剪前大小不统一
bbox.shape--([4,])人体检测框，用于裁剪
x.shape--([1,13]) 人体13个关键点的所有x坐标值
y.shape--([1,13])人体13个关键点的所有y坐标值

     def crop(image, bbox, x, y, length):

         x, y, bbox = x.astype(np.int), y.astype(np.int), bbox.astype(np.int)

         x_min, y_min, x_max, y_max = bbox

         w, h = x_max - x_min, y_max - y_min

         # Crop image to bbox

         image = image[y_min:y_min + h, x_min:x_min + w, :]

         # Crop joints and bbox

         x -= x_min

         y -= y_min

         bbox = np.array([0, 0, x_max - x_min, y_max - y_min])

         # Scale to desired size

         side_length = max(w, h)

         f_xy = float(length) / float(side_length)

         image, bbox, x, y = Transformer.scale(image, bbox, x, y, f_xy)

         # Pad

         new_w, new_h = image.shape[1], image.shape[0]

         cropped = np.zeros((length, length, image.shape[2]))

         dx = length - new_w

         dy = length - new_h

         x_min, y_min = int(dx / 2.), int(dy / 2.)

         x_max, y_max = x_min + new_w, y_min + new_h

         cropped[y_min:y_max, x_min:x_max, :] = image

         x += x_min

         y += y_min

         x = np.clip(x, x_min, x_max)

         y = np.clip(y, y_min, y_max)

         bbox += np.array([x_min, y_min, x_min, y_min])

         return cropped, bbox, x.astype(np.int), y.astype(np.int)

2.2 缩放（Scale）

image.shape--([3, 256, 256])一个视频序列中的一帧图片，裁剪后输入网络为256*256
bbox.shape--([4,])人体检测框，用于裁剪
x.shape--([1,13]) 人体13个关键点的所有x坐标值
y.shape--([1,13])人体13个关键点的所有y坐标值
f_xy--缩放倍数

     def scale(image, bbox, x, y, f_xy):

         (h, w, _) = image.shape

         h, w = int(h * f_xy), int(w * f_xy)

         image = resize(image, (h, w), preserve_range=True, anti_aliasing=True, mode='constant').astype(np.uint8)

         x = x * f_xy

         y = y * f_xy

         bbox = bbox * f_xy

         x = np.clip(x, 0, w)

         y = np.clip(y, 0, h)

         return image, bbox, x, y

2.3 翻转（fillip）

这里是将图片围绕对称轴进行左右翻转（因为人体是左右对称的，在关键点检测中有助于防止模型过拟合）

     def flip(image, bbox, x, y):

         image = np.fliplr(image).copy()

         w = image.shape[1]

         x_min, y_min, x_max, y_max = bbox

         bbox = np.array([w - x_max, y_min, w - x_min, y_max])

         x = w - x

         x, y = Transformer.swap_joints(x, y)

         return image, bbox, x, y

翻转前：

翻转后：

2.4 旋转（rotate）

angle--旋转角度

     def rotate(image, bbox, x, y, angle):

         # image - -(256, 256, 3)

         # bbox - -(4,)

         # x - -[126 129 124 117 107  99 128 107 108 105 137 155 122  99]

         # y - -[209 176 136 123 178 225  65  47  46  24  44  64  49  54]

         # angle - --8.165648811999333

         # center of image [128,128]

         o_x, o_y = (np.array(image.shape[:2][::-1]) - 1) / 2.

         width,height = image.shape[0],image.shape[1]

         x1 = x

         y1 = height - y

         o_x = o_x

         o_y = height - o_y

         image = rotate(image, angle, preserve_range=True).astype(np.uint8)

         r_x, r_y = o_x, o_y

         angle_rad = (np.pi * angle) /180.0

         x = r_x + np.cos(angle_rad) * (x1 - o_x) - np.sin(angle_rad) * (y1 - o_y)

         y = r_y + np.sin(angle_rad) * (x1 - o_x) + np.cos(angle_rad) * (y1 - o_y)

         x = x

         y = height - y

         bbox[0] = r_x + np.cos(angle_rad) * (bbox[0] - o_x) + np.sin(angle_rad) * (bbox[1] - o_y)

         bbox[1] = r_y + -np.sin(angle_rad) * (bbox[0] - o_x) + np.cos(angle_rad) * (bbox[1] - o_y)

         bbox[2] = r_x + np.cos(angle_rad) * (bbox[2] - o_x) + np.sin(angle_rad) * (bbox[3] - o_y)

         bbox[3] = r_y + -np.sin(angle_rad) * (bbox[2] - o_x) + np.cos(angle_rad) * (bbox[3] - o_y)

         return image, bbox, x.astype(np.int), y.astype(np.int)

旋转前：

旋转后：

3 结果（output）

数据增强前的原图：

数据增强后：

图像数据增强 (Data Augmentation in Computer Vision)的更多相关文章

Python数据增强(data augmentation)库--Augmentor 使用介绍
Augmentor 使用介绍原图 random_distortion(probability, grid_height, grid_width, magnitude) 最终选择参数为 p.rando ...
【Tool】Augmentor和imgaug——python图像数据增强库
Augmentor和imgaug--python图像数据增强库 Tags: ComputerVision Python 介绍两个图像增强库:Augmentor和imgaug,Augmentor使用比较 ...
[DeeplearningAI笔记]卷积神经网络2.9-2.10迁移学习与数据增强
4.2深度卷积网络觉得有用的话,欢迎一起讨论相互学习~Follow Me 2.9迁移学习迁移学习的基础知识已经介绍过,本篇博文将介绍提高的部分. 提高迁移学习的速度可以将迁移学习模型冻结的部分看 ...
Python库 - Albumentations 图片数据增强库
Python图像处理库 - Albumentations,可用于深度学习中网络训练时的图片数据增强. Albumentations 图像数据增强库特点: 基于高度优化的 OpenCV 库实现图像快速数 ...
OpenCV中IplImage图像格式与BYTE图像数据的转换
最近在将Karlsruhe Institute of Technology的Andreas Geiger发表在ACCV2010上的Efficent Large-Scale Stereo Matchin ...
Javascript高级编程学习笔记（93）—— Canvas(10) 模式及图像数据
模式模式其实就是重复的图像,用来填充或描边图形要创建一个新模式,可以调用 createPattern()并传入两个参数一个HTML img元素用于表示如何重复的字符串 "repeat ...
keras对图像数据进行增强 | keras data augmentation
本文首发于个人博客https://kezunlin.me/post/8db507ff/,欢迎阅读最新内容! keras data augmentation Guide code # import th ...
【48】数据扩充（Data augmentation）
数据扩充(Data augmentation) 大部分的计算机视觉任务使用很多的数据,所以数据扩充是经常使用的一种技巧来提高计算机视觉系统的表现.我认为计算机视觉是一个相当复杂的工作,你需要输入图像的 ...
Computer Vision 学习 -- 图像存储格式
本文把自己理解的图像存储格式总结一下. 计算机中的数据,都是二进制的,所以图片也不例外. 这是opencv文档的描述,具体在代码里面,使用矩阵来进行存储. 类似下图是(BGR格式): 图片的最小单位是 ...

随机推荐

sql server 数据导出（入）方法总结
我们都知道日常在面对数据需求时需要导出数据,比较少量的数据导出我们一般是通过查询后另存即可,当面对数据量比较大的时候我们应该怎么处理?我搜索总结一些几个方法:1.bcp 导出.2.数据库本身自带的导入 ...
C#语言————第四章常用Convert类的类型转换方法
方法说明Convert.ToInt32() 转换为整型(int 型)Convert.ToStringle() 转换为单精度浮点型(float 型)Convert.ToDouble() 转换为双精度 ...
CRM JS
注意事项:Xrm.Page中的方法使用的是实体.字段.关系的逻辑名称.窗体调试:contentIFrame.Xrm.Page.getControl("compositeControlPara ...
centos6.9NAT网络模式
1.对虚拟机进行设置,点击该虚拟机的设置在网络适配器下将网络连接设置为NAT模式. 2.对虚拟机进行设置,点击虚拟机左上方的编辑-->虚拟网络编辑器,将WMnet信息设置为NAT模式,其它的无需 ...
parallels Desktop解决无法压缩硬盘的问题
使用pd12新建的win7虚拟机仅仅使用了四十个G,但在本地硬盘中的体现却是占用了一百左右:尝试压缩提示: 无法编辑硬盘属性,因为该硬盘有一个或多个快照. 该硬盘属于某一带有一个或多个快照的虚拟机.请 ...
SAP ABAP 查找用户出口
1.查找事物代码程序名 2.查找用户出口 T-CODE:SE80 在子例程中查找以USEREXIT开头的子程序.
Static简介
1.static称为静态修饰符,它可以修饰类中得成员.被static修饰的成员被称为静态成员,也成为类成员,而不用static修饰的成员称为实例成员. 2.当 Voluem volu1 = new V ...
个人技术博客——linux服务器配置以及flask框架
本次的软件工程实践,我负责我们组后台服务的搭建,我选用了bandwagon的服务器,安装的是Debian GNU/Linux,全程在root用户下操作,后端服务是用python的flask框架,数据库 ...
Highcharts属性与Y轴数据值刻度显示Y轴最小最大值
Highcharts 官网:https://www.hcharts.cn/demo/highcharts Highcharts API文档:https://api.hcharts.cn/highcha ...
JS面向对象之工厂模式
js面向对象什么是对象 "无序属性的集合,其属性可以包括基本值.对象或者函数",对象是一组没有特定顺序的的值.对象的没个属性或方法都有一个俄名字,每个名字都映射到一个值. 简单来 ...

图像数据增强 (Data Augmentation in Computer Vision)

图像数据增强 (Data Augmentation in Computer Vision)的更多相关文章

随机推荐

热门专题