This short tutorial shows how to compute Fisher vector and VLAD encodings with VLFeat MATLAB interface.

These encoding serve a similar purposes: summarizing in a vectorial statistic a number of local feature descriptors (e.g. SIFT). Similarly to bag of visual words, they assign local descriptor to elements in a visual dictionary, obtained with vector quantization (KMeans) in the case of VLAD or a Gaussian Mixture Models for Fisher Vectors. However, rather than storing visual word occurrences only, these representations store a statistics of the difference between dictionary elements and pooled local features.

Fisher encoding

The Fisher encoding uses GMM to construct a visual word dictionary. To exemplify constructing a GMM, consider a number of 2 dimensional data points (see also the GMM tutorial). In practice, these points would be a collection of SIFT or other local image features. The following code fits a GMM to the points:

numFeatures = 5000 ;

dimension = 2 ;

data = rand(dimension,numFeatures) ;

numClusters = 30 ;

[means, covariances, priors] = vl_gmm(data, numClusters);

Next, we create another random set of vectors, which should be encoded using the Fisher Vector representation and the GMM just obtained:

numDataToBeEncoded = 1000;

dataToBeEncoded = rand(dimension,numDataToBeEncoded);

The Fisher vector encoding enc of these vectors is obtained by calling the vl_fisher function using the output of the vl_gmm function:

encoding = vl_fisher(datatoBeEncoded, means, covariances, priors);

The encoding vector is the Fisher vector representation of the data dataToBeEncoded.

Note that Fisher Vectors support several normalization options that can affect substantially the performance of the representation.

VLAD encoding

The Vector of Linearly Agregated Descriptors is similar to Fisher vectors but (i) it does not store second-order information about the features and (ii) it typically use KMeans instead of GMMs to generate the feature vocabulary (although the latter is also an option).

Consider the same 2D data matrix data used in the previous section to train the Fisher vector representation. To compute VLAD, we first need to obtain a visual word dictionary. This time, we use K-means:

numClusters = 30 ;

centers = vl_kmeans(dataLearn, numClusters);

Now consider the data dataToBeEncoded and use the vl_vlad function to compute the encoding. Differently from vl_fisher, vl_vlad requires the data-to-cluster assignments to be passed in. This allows using a fast vector quantization technique (e.g. kd-tree) as well as switching from soft to hard assignment.

In this example, we use a kd-tree for quantization:

kdtree = vl_kdtreebuild(centers) ;

nn = vl_kdtreequery(kdtree, centers, dataEncode) ;

Now we have in the nn the indexes of the nearest center to each vector in the matrix dataToBeEncoded. The next step is to create an assignment matrix:

assignments = zeros(numClusters,numDataToBeEncoded);

assignments(sub2ind(size(assignments), nn, 1:length(nn))) = 1;

It is now possible to encode the data using the vl_vlad function:

enc = vl_vlad(dataToBeEncoded,centers,assignments);

Note that, similarly to Fisher vectors, VLAD supports several normalization options that can affect substantially the performance of the representation.

from: http://www.vlfeat.org/overview/encodings.html

计算Fisher vector和VLAD的更多相关文章

Fisher Vector Encoding and Gaussian Mixture Model
一.背景知识 1. Discriminant Learning Algorithms(判别式方法) and Generative Learning Algorithms(生成式方法) 现在常见的模式 ...
【CV知识学习】Fisher Vector
在论文<action recognition with improved trajectories>中看到fisher vector,所以学习一下.但网上很多的资料我觉得都写的不好,查了一 ...
Fisher vector for image classification
http://files.cnblogs.com/files/sylar120/fisher_vector.rar 拿各个参数上的偏导作为特征
VLAD算法浅析, BOF、FV比较
划重点 ================================================= BOF.FV.VLAD等算法都是基于特征描述算子的特征编码算法,关于特征描述算子是以SIFT ...
转 STL之vector的使用
http://www.cnblogs.com/caoshenghe/archive/2010/01/31/1660399.html 第一部分使用入门 vector可用于代替C中的数组,或者MFC中的 ...
Aggregating local features for Image Retrieval
Josef和Andrew在2003年的ICCV上发表的论文[10]中,将文档检索的方法借鉴到了视频中的对象检测中.他们首先将图像的特征描述类比成单词,并建立了基于SIFT特征的vusual word ...
残差网络resnet学习
Deep Residual Learning for Image Recognition 微软亚洲研究院的何凯明等人论文地址 https://arxiv.org/pdf/1512.03385v1.p ...
Resnet论文翻译
摘要越深层次的神经网络越难以训练.我们提供了一个残差学习框架,以减轻对网络的训练,这些网络的深度比以前的要大得多.我们明确地将这些层重新规划为通过参考输入层x,学习残差函数,来代替没有参考的学习函数 ...
图像检索(1): 再论SIFT-基于vlfeat实现
概述基于内容的图像检索技术是采用某种算法来提取图像中的特征,并将特征存储起来,组成图像特征数据库.当需要检索图像时,采用相同的特征提取技术提取出待检索图像的特征,并根据某种相似性准则计算得到特征数据 ...

随机推荐

filebeat安装部署
简单概述最近在了解ELK做日志采集相关的内容,这篇文章主要讲解通过filebeat来实现日志的收集.日志采集的工具有很多种,如fluentd, flume, logstash,betas等等.首先要 ...
8-1 binpacking uva1149（贪心）
题意:给定N个物品的重量Li 背包的容量M 同时要求每个背包最多装两个物品求至少要多少个背包才能装下所有物品简单贪心注意输出: #include<bits/stdc++.h> u ...
在PHP中gmtime()与time()区别
localtime是把从1970-1-1零点零分到当前时间系统所偏移的秒数时间转换为本地时间,而gmtime函数转换后的时间没有经过时区变换,是UTC时间.2.说明:此函数获得的tm结构体的时间是日历 ...
php利用root权限执行shell脚本（转）
转一篇博客,之前搞这个东西搞了好久,结果今天晚上看到了一篇救命博客,瞬间开心了...转载转载利用sudo来赋予Apache的用户root的执行权限,下面记录一下: 利用PHP利用root权限执行sh ...
20169211《Linux内核原理及分析》第十二周作业
Collabtive 系统 SQL 注入实验实验介绍 SQL注入漏洞的代码注入技术,利用web应用程序和数据库服务器之间的接口.通过把SQL命令插入到Web表单提交或输入域名或页面请求的查询字符串, ...
java中int和Integer比较
java中int和Integer比较一,类型区别我们知道java中由两种数据类型,即基本类型和对象类型,int就是基本数据类型,而Integer是一个class,也习惯把Integer叫做int的 ...
C++运算符重载模板友元 new delete ++ = +=
今天的重载是基于C++ 类模板的,如果需要非类模板的重载的朋友可以把类模板拿掉,同样可以参考,谢谢. 一.类模板中的友元重载本人喜好类声明与类成员实现分开写的代码风格,如若您喜欢将类成员函数的实现写 ...
[ 转载 ] Mysql 数据库常用命令
完整的创建数据库例子: >create database db_test default character set utf8 collate utf8_general_ci; >use ...
ARM 内核
ARM相关知识: ARM核:A8,ARM11,ARM9 指令架构:ARMv7,ARMv6,ARMv4 ARM核分为两个阵营: 经典型:ARM7,ARM9,ARM11 Cortex: Cortex A: ...
Python 面向对象编程——访问限制
<无访问限制的对象> 在Class内部,可以有属性和方法,而外部代码可以通过直接调用实例变量的方法来操作数据,这样,就隐藏了内部的复杂逻辑.但是,从前面Student类的定义来看(见:Py ...

计算Fisher vector和VLAD

Fisher encoding

VLAD encoding

计算Fisher vector和VLAD的更多相关文章

随机推荐

热门专题