Visual Categorization with Bags of Keypoints
1.Introduction and backgrounds
作为本周的论文之一,这是一篇bag of features的基本文章之一,主要了解其中的基本思路,以及用到的基本技术,尽量使得细节更加清楚。
bag of keypoints的基本原理是:
A bag of keypoints corresponds to a histogram of the number of occurrences of particular image patterns in a given image.
2. The main step
The main steps of our method are:
• Detection and description of image patches 虽然patches 是小块的意思,但是这similar to the meaning of patten
• Assigning patch descriptors to a set of predetermined clusters (a vocabulary) 第二部是对某一类的cluster 分配一个vocabulary
with a vector quantization algorithm 用一个矢量量化算法
• Constructing a bag of keypoints, which counts the number of patches assigned to each cluster实际转化为了计算hiastogram的分配问题,只是计算patten的histogram
• Applying a multi-class classifier, treating the bag of keypoints as the feature vector, and thus determine which category or categories to assign to the image.采用一个多分类的分类器对histogram进行 classify,最后得到每种分类
3.Therefore the steps involved in training the system allow consideration of multiple possible vocabularies:
• Detection and description of image patches for a set of labeled training
• Constructing a set of vocabularies: each is a set of cluster centres, with re-
spect to which descriptors are vector quantized.
• Extracting bags of keypoints for these vocabularies 提取这些词汇的keypoints,这些keypoints的定义是如何呢?(We refer to the quantized feature vectors (cluster centres) as “keypoints” by analogy with “keywords” in text categorization.)所以实际上应该指的histogram
• Training multi-class classifiers using the bags of keypoints as feature vectors 训练分类器,本文所采用两种分类器,分别所bayes分类器和SVM分类器
4.Feature extraction
5. Visual vocabulary construction
总体目标:the vocabulary is a way of constructing a feature vector for classification that relates “new” descriptors in query images to descriptors previously seen in
training 实际就是建立相应的descriptor
经过一系列的说明,作者选择了比较常用的k-means算法作为vocabulaory building 的算法
但是k-means会带来两个问题:一、k-means仅仅对局部的最优化比较好;二、k-means的参数k是无法自己设定的,需要人工设定(作者解决方案是 多做几组,然后采用错误率最低的)
(1) 贝叶斯分类
considering visual categorization, assume we have a set of labeled images I = Ii and a vocabulary V = vi of representative keypoints (i.e. cluster centers). Each
descriptor extracted from an image is labeled with the keypoint to which it lies closest in feature space. We count the number N(t,i) of times keypoint vi occurs in image Ii .
构造相应的分类特征,每个descriptor都被一个keypoint(空间内最近)所描述,然后计算每个keypoint在image I 中出现的概率,类似于计算histogram
P (C j | I i ) α P (C j )P (I i | C j )
(2) SVM 分类
In order to apply the SVM to multi-class problems we take the one-against-all approach. Given an m-class problem, we train m SVM’s, each distinguishes images from
some category i from images from all the other m-1 categories j not equal to i. Given a query image, we assign it to the class with the largest SVM output。
In the first we explore the impact of the number of clusters on classifier accuracy and evaluate the performance of the Naïve Bayes classifier. We then explore the performance of the SVM on the same problem.
(1)贝叶斯方法 k=1000时效果比较好
(2)SVM中linear method gave the best performance (except in the case of cars where a quadratic SVM gave better results)
