YOLOv3 K-means获取anchors大小

YOLOv1和YOLOv2简单看了一下，详细看了看YOLOv3,刚看的时候是蒙圈的，经过一番研究，分步记录一下几个关键的点：

v2和v3中加入了anchors和Faster rcnn有一定区别，这个anchors如何理解呢？

个人理解白话篇：

（1）就是有一批标注bbox数据，标注为左上角坐标和右下角坐标，将bbox聚类出几个类作为事先设置好的anchor的宽高，对应格式就是voc数据集标xml注格式即可。

代码提取标注数据里的宽高并用图像的宽高进行归一化：

def load_dataset(path):

	dataset = []

	for xml_file in glob.glob("{}/*xml".format(path)):

		tree = ET.parse(xml_file)

		height = int(tree.findtext("./size/height"))

		width = int(tree.findtext("./size/width"))

		for obj in tree.iter("object"):

			xmin = int(obj.findtext("bndbox/xmin")) / width

			ymin = int(obj.findtext("bndbox/ymin")) / height

			xmax = int(obj.findtext("bndbox/xmax")) / width

			ymax = int(obj.findtext("bndbox/ymax")) / height

			dataset.append([xmax - xmin, ymax - ymin])

	return np.array(dataset)

（2）具体怎么分的呢？就是用K-means对所有标注的bbox数据根据宽高进行分堆，voc数据被分为9个堆，距离是用的distance = 1-iou

import numpy as np

'''

(1)k-means拿到数据里所有的目标框N个，得到所有的宽和高，在这里面随机取得9个作为随机中心
(2)然后其他所有的bbox根据这9个宽高依据iou(作为距离)进行计算，计算出N行9列个distance吧

(3)找到每一行中最小的那个即所有的bbox都被分到了9个当中的一个,然后计算9个族中所有bbox的中位数更新中心点。
(4）直到9个中心不再变即可，这9个中心的x，y就是整个数据的9个合适的anchors==框的宽和高。

'''

def iou(box, clusters):

    """

    Calculates the Intersection over Union (IoU) between a box and k clusters.

    :param box: tuple or array, shifted to the origin (i. e. width and height)

    :param clusters: numpy array of shape (k, 2) where k is the number of clusters

    :return: numpy array of shape (k, 0) where k is the number of clusters

    """

    #计算每个box与9个clusters的iou

    # boxes ： 所有的[[width, height], [width, height], …… ]

    # clusters : 9个随机的中心点[width, height]

    x = np.minimum(clusters[:, 0], box[0])

    y = np.minimum(clusters[:, 1], box[1])

    if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:

        raise ValueError("Box has no area")

    intersection = x * y

    # 所有的boxes的面积

    box_area = box[0] * box[1]

    cluster_area = clusters[:, 0] * clusters[:, 1]

    iou_ = intersection / (box_area + cluster_area - intersection)

    return iou_

def avg_iou(boxes, clusters):

    """

    Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters.

    :param boxes: numpy array of shape (r, 2), where r is the number of rows

    :param clusters: numpy array of shape (k, 2) where k is the number of clusters

    :return: average IoU as a single float

    """

    return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])

def translate_boxes(boxes):

    """

    Translates all the boxes to the origin.

    :param boxes: numpy array of shape (r, 4)

    :return: numpy array of shape (r, 2)

    """

    new_boxes = boxes.copy()

    for row in range(new_boxes.shape[0]):

        new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])

        new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])

    return np.delete(new_boxes, [0, 1], axis=1)

def kmeans(boxes, k, dist=np.median):

    """

    Calculates k-means clustering with the Intersection over Union (IoU) metric.

    :param boxes: numpy array of shape (r, 2), where r is the number of rows

    :param k: number of clusters

    :param dist: distance function

    :return: numpy array of shape (k, 2)

    """

    rows = boxes.shape[0]

    distances = np.empty((rows, k))

    last_clusters = np.zeros((rows,))

    np.random.seed()

    # the Forgy method will fail if the whole array contains the same rows

    #初始化k个聚类中心（从原始数据集中随机选择k个）

    clusters = boxes[np.random.choice(rows, k, replace=False)]

    while True:

        for row in range(rows):

            # 定义的距离度量公式：d(box,centroid)=1-IOU(box,centroid)。到聚类中心的距离越小越好，

            # 但IOU值是越大越好，所以使用 1 - IOU，这样就保证距离越小，IOU值越大。

            # 计算所有的boxes和clusters的值（row，k）

            distances[row] = 1 - iou(boxes[row], clusters)

            #print(distances)

        # 将标注框分配给“距离”最近的聚类中心（也就是这里代码就是选出（对于每一个box）距离最小的那个聚类中心）。

        nearest_clusters = np.argmin(distances, axis=1)

        # 直到聚类中心改变量为0（也就是聚类中心不变了）。

        if (last_clusters == nearest_clusters).all():

            break

        # 计算每个群的中心（这里把每一个类的中位数作为新的聚类中心）

        for cluster in range(k):

            #这一句是把所有的boxes分到k堆数据中,比较别扭，就是分好了k堆数据，每堆求它的中位数作为新的点

            clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)

        last_clusters = nearest_clusters

    return clusters

　运行代码：

import glob

import xml.etree.ElementTree as ET

import numpy as np

from kmeans import kmeans, avg_iou

#ANNOTATIONS_PATH = "Annotations"

CLUSTERS = 9

def load_dataset(path):

	dataset = []

	for xml_file in glob.glob("{}/*xml".format(path)):

		tree = ET.parse(xml_file)

		height = int(tree.findtext("./size/height"))

		width = int(tree.findtext("./size/width"))

		for obj in tree.iter("object"):

			xmin = int(obj.findtext("bndbox/xmin")) / width

			ymin = int(obj.findtext("bndbox/ymin")) / height

			xmax = int(obj.findtext("bndbox/xmax")) / width

			ymax = int(obj.findtext("bndbox/ymax")) / height

			dataset.append([xmax - xmin, ymax - ymin])

	return np.array(dataset)

ANNOTATIONS_PATH ="自己数据路径"

data = load_dataset(ANNOTATIONS_PATH)

out = kmeans(data, k=CLUSTERS)

print("Accuracy: {:.2f}%".format(avg_iou(data, out) * 100))

#print("Boxes:\n {}".format(out))

print("Boxes:\n {}-{}".format(out[:, 0]*416, out[:, 1]*416))

ratios = np.around(out[:, 0] / out[:, 1], decimals=2).tolist()

print("Ratios:\n {}".format(sorted(ratios)))

　　自己计算的VOC2007数据集总共9963个标签数据，跟论文中给到的有些许出入，可能是coco和voc2007的区别吧,

计算如下：

Accuracy:

67.22%

Boxes（自己修改的格式都4舍5入了，ratios有些许对不上）:
[347,327 40,40 76,77 184,277 89,207 162,134 14,27 44,128 23,72]

Ratios:
[0.32, 0.35, 0.43, 0.55, 0.67, 0.99, 1.02, 1.06, 1.21]

YOLOv3 K-means获取anchors大小的更多相关文章

iOS获取网络图片大小
在iOS开发过程中经常需要通过网络请求加载图片,有时,需要在创建UIImageView或UIButton来显示图片之前需要提前知道图片的尺寸,根据图片尺寸创建对应大小的控件.但是对于网络图片来说,要想 ...
js获取屏幕大小
1.js获取屏幕大小 <html> <script> function a(){ document.write( "屏幕分辨率为:"+screen.widt ...
用 Javascript 获取页面大小、窗口大小和滚动条位置
页面大小.窗口大小和滚动条位置这三个数值在不同的浏览器例如 Firefox 和 IE 中有着不同的实现.即使在同一种浏览器例如 IE 中,不同版本也有不同的实现. 本文给出两个能兼容目前所有浏览器的 ...
iOS 获取内存大小使用情况（进度条显示）
一.获取设备内存大小方法 //返回存储内存占用比例 - (NSString *)getFreeDiskspaceRate{ float totalSpace; .f; NSError *error = ...
转：VC++获取屏幕大小第一篇像素大小GetSystemMetrics
VC++获取屏幕大小第一篇像素大小 GetSystemMetrics>和<VC++获取屏幕大小第二篇物理大小GetDeviceCaps 上>和<VC++获取屏幕大小第三篇物理 ...
wift - 使用UIScreen类获取屏幕大小尺寸
UISreen类代表了屏幕,开发中一般用来获取屏幕相关的属性,例如获取屏幕的大小. 1 2 3 4 5 6 7 //获取屏幕大小 var screenBounds:CGRect = UIScreen. ...
Android 获取屏幕大小和密度
Android 获取屏幕大小和密度 DisplayMetrics metric = new DisplayMetrics(); getWindowManager().getDefaultDisplay ...
SDWebimage如何获取缓存大小以及清除缓存
sdwebimage如何获取缓存大小以及清除缓存 1.找到SDImageCache类 2.添加如下方法: - (float)checkTmpSize { float totalSize = 0; ...
通过url动态获取图片大小方法总结
很多时候再项目中,我们往往需要先获取图片的大小再加载图片,但是某些特定场景,如用过cocos2d-js的人都知道,在它那里只能按比例缩放大小,是无法设置指定大小的图片的,这就是cocos2d-js 的 ...

随机推荐

java中的常量和变量
变量的概念: 占据着内存中的某一个存储区域; 该区域有自己的名称(变量名)和类型(数据类型); 该区域的数据可以在同一类型范围内不断变化; 为什么要定义变量: 用来不断的存放同一类型的常量,并可以重复 ...
指针版的PStash(用一个void指针数组, 来保存存入元素的地址) 附模板化实现 p321
由容器PStash的使用者,负责清除容器中的所有指针.所以用户必须记住放到容器中的是什么类型,在取出时,把取出的void指针转换成对应的类型指针,然后 'delete 转换后的对象指针',才能在清除时 ...
vue项目安装scss，以及安装scss报错（this.getResolve is not a function）
1.安装scss: npm install node-sass sass-loader vue-style-loader --save-dev //安装node-sass sass-loader vu ...
802.1X技术简介
Spring Boot 2.x使用Mockito进行测试
在上一篇,项目基本实现了Spring Boot对Mybatis的整合.这篇文章使用Mockito对项目进行测试. 1.使用postmat测试: 2.编写单元测试类,使用mockito进行测试: 3.使 ...
2019-8-31-C#-获取-PC-序列号
title author date CreateTime categories C# 获取 PC 序列号 lindexi 2019-08-31 16:55:58 +0800 2018-7-30 10: ...
JavaSE基础---多线程
进程:正在进行的程序.其实就是一个应用程序运行时的内存分配空间. 线程:进程中一个程序执行控制单元,一条执行路径.进程负责的事应用程序的空间的标识,线程负责的事应用程序的执行顺序. 进程和线程的关系: ...
装饰器&偏函数与作用域与异常处理与文件读写
装饰器概念:是一个闭包,把一个函数当做参数返回一个替代版的函数,本质上就是一个返回函数的函数简单的装饰器 def func1(): print("sunck is a good man& ...
小白学 Python 爬虫（29）：Selenium 获取某大型电商网站商品信息
人生苦短,我用 Python 前文传送门: 小白学 Python 爬虫(1):开篇小白学 Python 爬虫(2):前置准备(一)基本类库的安装小白学 Python 爬虫(3):前置准备(二)Li ...
西游记之孙悟空三打白骨精（IMAX）
短评:看了20分钟就有玩手机的冲动.剧情还差点意思,不能达到吸引人目不转睛的程度

YOLOv3 K-means获取anchors大小

YOLOv3 K-means获取anchors大小的更多相关文章

随机推荐

热门专题