计算Fisher vector和VLAD
This short tutorial shows how to compute Fisher vector and VLAD encodings with VLFeat MATLAB interface.
These encoding serve a similar purposes: summarizing in a vectorial statistic a number of local feature descriptors (e.g. SIFT). Similarly to bag of visual words, they assign local descriptor to elements in a visual dictionary, obtained with vector quantization (KMeans) in the case of VLAD or a Gaussian Mixture Models for Fisher Vectors. However, rather than storing visual word occurrences only, these representations store a statistics of the difference between dictionary elements and pooled local features.
Fisher encoding
The Fisher encoding uses GMM to construct a visual word dictionary. To exemplify constructing a GMM, consider a number of 2 dimensional data points (see also the GMM tutorial). In practice, these points would be a collection of SIFT or other local image features. The following code fits a GMM to the points:
numFeatures = 5000 ;
dimension = 2 ;
data = rand(dimension,numFeatures) ; numClusters = 30 ;
[means, covariances, priors] = vl_gmm(data, numClusters);
Next, we create another random set of vectors, which should be encoded using the Fisher Vector representation and the GMM just obtained:
numDataToBeEncoded = 1000;
dataToBeEncoded = rand(dimension,numDataToBeEncoded);
The Fisher vector encoding enc of these vectors is obtained by calling the vl_fisher function using the output of the vl_gmm function:
encoding = vl_fisher(datatoBeEncoded, means, covariances, priors);
The encoding vector is the Fisher vector representation of the data dataToBeEncoded.
Note that Fisher Vectors support several normalization options that can affect substantially the performance of the representation.
VLAD encoding
The Vector of Linearly Agregated Descriptors is similar to Fisher vectors but (i) it does not store second-order information about the features and (ii) it typically use KMeans instead of GMMs to generate the feature vocabulary (although the latter is also an option).
Consider the same 2D data matrix data used in the previous section to train the Fisher vector representation. To compute VLAD, we first need to obtain a visual word dictionary. This time, we use K-means:
numClusters = 30 ;
centers = vl_kmeans(dataLearn, numClusters);
Now consider the data dataToBeEncoded and use the vl_vlad function to compute the encoding. Differently from vl_fisher, vl_vlad requires the data-to-cluster assignments to be passed in. This allows using a fast vector quantization technique (e.g. kd-tree) as well as switching from soft to hard assignment.
In this example, we use a kd-tree for quantization:
kdtree = vl_kdtreebuild(centers) ;
nn = vl_kdtreequery(kdtree, centers, dataEncode) ;
Now we have in the nn the indexes of the nearest center to each vector in the matrix dataToBeEncoded. The next step is to create an assignment matrix:
assignments = zeros(numClusters,numDataToBeEncoded);
assignments(sub2ind(size(assignments), nn, 1:length(nn))) = 1;
It is now possible to encode the data using the vl_vlad function:
enc = vl_vlad(dataToBeEncoded,centers,assignments);
Note that, similarly to Fisher vectors, VLAD supports several normalization options that can affect substantially the performance of the representation.
from: http://www.vlfeat.org/overview/encodings.html
计算Fisher vector和VLAD的更多相关文章
- Fisher Vector Encoding and Gaussian Mixture Model
一.背景知识 1. Discriminant Learning Algorithms(判别式方法) and Generative Learning Algorithms(生成式方法) 现在常见的模式 ...
- 【CV知识学习】Fisher Vector
在论文<action recognition with improved trajectories>中看到fisher vector,所以学习一下.但网上很多的资料我觉得都写的不好,查了一 ...
- Fisher vector for image classification
http://files.cnblogs.com/files/sylar120/fisher_vector.rar 拿各个参数上的偏导作为特征
- VLAD算法浅析, BOF、FV比较
划重点 ================================================= BOF.FV.VLAD等算法都是基于特征描述算子的特征编码算法,关于特征描述算子是以SIFT ...
- 转 STL之vector的使用
http://www.cnblogs.com/caoshenghe/archive/2010/01/31/1660399.html 第一部分 使用入门 vector可用于代替C中的数组,或者MFC中的 ...
- Aggregating local features for Image Retrieval
Josef和Andrew在2003年的ICCV上发表的论文[10]中,将文档检索的方法借鉴到了视频中的对象检测中.他们首先将图像的特征描述类比成单词,并建立了基于SIFT特征的vusual word ...
- 残差网络resnet学习
Deep Residual Learning for Image Recognition 微软亚洲研究院的何凯明等人 论文地址 https://arxiv.org/pdf/1512.03385v1.p ...
- Resnet论文翻译
摘要 越深层次的神经网络越难以训练.我们提供了一个残差学习框架,以减轻对网络的训练,这些网络的深度比以前的要大得多.我们明确地将这些层重新规划为通过参考输入层x,学习残差函数,来代替没有参考的学习函数 ...
- 图像检索(1): 再论SIFT-基于vlfeat实现
概述 基于内容的图像检索技术是采用某种算法来提取图像中的特征,并将特征存储起来,组成图像特征数据库.当需要检索图像时,采用相同的特征提取技术提取出待检索图像的特征,并根据某种相似性准则计算得到特征数据 ...
随机推荐
- pyqt5猜数小程序
程序界面用qt设计师制作,并用pyuic5命令转换成form.py文件 #-*- coding:utf-8 -*- from PyQt5.QtWidgets import QApplication,Q ...
- Dubbo中多注册中心问题与服务分组
一:注册中心 1.场景 Dubbo 支持同一服务向多注册中心同时注册, 或者不同服务分别注册到不同的注册中心上去, 甚至可以同时引用注册在不同注册中心上的同名服务. 2.多注册中心注册 中文站有些服务 ...
- poj3624 Charm Bracelet(DP,01背包)
题目链接 http://poj.org/problem?id=3624 题意 有n个手镯,每个手镯有两个属性:重量W,需求因子D.还有一个背包,它能装下总重量不超过M的手镯.现在将一些镯子装入背包,求 ...
- Redis和MySQL数据一致中出现的几种情况
1. MySQL持久化数据,Redis只读数据 redis在启动之后,从数据库加载数据. 读请求: 不要求强一致性的读请求,走redis,要求强一致性的直接从mysql读取 写请求: 数据首先都写到数 ...
- 手机html根据手机分辨率网页文字大小自适应
问题:不同手机型号屏幕尺寸大不相同,导致同样的文字,有的显示一行,有的显示多行. 通过查资料和自己的尝试解决:网页开发习惯的px单位,手机html开发不适用. 源代码如下: <!DOCTYPE ...
- BZOJ 4826: [Hnoi2017]影魔 单调栈 主席树
https://www.lydsy.com/JudgeOnline/problem.php?id=4826 年少不知空间贵,相顾mle空流泪. 和上一道主席树求的东西差不多,求两种对 1. max(a ...
- bzoj 3956: Count
3956: Count Description Input Output Sample Input 3 2 0 2 1 2 1 1 1 3 Sample Output 0 3 HINT M,N< ...
- vijos p1882 智力题
题意: 清晨, Alice与Bob在石阶上玩砖块.他们每人都有属于自己的一堆砖块.每人的砖块都由N列组成且N是奇数.Alice的第i列砖块有m[i]个.而Bob的第i列砖块有s[i]个. 他们想建造城 ...
- 【BZOJ-3110】K大数查询 整体二分 + 线段树
3110: [Zjoi2013]K大数查询 Time Limit: 20 Sec Memory Limit: 512 MBSubmit: 6265 Solved: 2060[Submit][Sta ...
- Vue的过渡或动画
一.过渡的类名 在进入/离开的过渡中,共有6种class进行切换,分别是v-enter,v-enter-active,v-enter-to,v-leave,v-leave-active,v-leave ...