Distinctive Image Features from Scale-Invariant Keypoints,这篇论文是图像识别领域SIFT算法最为经典的一篇论文,导师给布置的第一篇任务就是它。网上找了好多找不到中译本,那就自己动手丰衣足食吧,顺便造福后人,花时间翻译啃下来并做一个笔记在这吧。


Distinctive Image Features from Scale-Invariant Keypoints



This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a a substantial(充实的,有实力的) range of affine(仿射,几何学) distortion(扭曲,变形), change in 3D viewpoint, addition of noise, and change in illumination.The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
这篇文章展示了一种从图片中提取有特色的不变特征方法,它可以用来执行对一个物体或者风景不同视角之间的匹配。这些特征对于图像的伸缩以及旋转是不变的,而且展示出能对几何扭曲变形、变换三维视角,增加噪声,光照改变进行健壮的匹配。这些特征是独特的,在一幅场景中,一个单个的特征能够被正确的在很多图片的大量数据库中进行高可能性的匹配。这篇文章也提供一种方法来利用这些特征用于物体识别,这一识别通过在已知物体是什么的数据库中利用快速邻域法(fast nearest-neighbor algorithm)匹配独立的特征。紧接着用Hough变换以鉴别对于一个单个物体的类属,最终对一致姿势的属性通过最小方差法(least-squares solution)执行认证。这种方法识别能够很好的在聚类与闭塞(occlusion )之间识别物体的同时接近实时的表现



Image matching is a fundamental aspect of many problems in computer vision, including object or scene recognition, solving for 3D structure from multiple images, stereo correspondence, and motion tracking. This paper describes image features that have many properties that make them suitable for matching differing images of an object or scene. The features are invariant to image scaling and rotation, and partially invariant to change in illumination and 3D camera viewpoint. They are well localized in both the spatial and frequency domains, reducing the probability of disruption by occlusion, clutter, or noise. Large numbers of features can be extracted from typical images with efficient algorithms. In addition, the features are highly distinctive, which allows a single feature to be correctly matched with high probability against a large database of features, providing a basis for object and scene recognition.

The cost of extracting these features is minimized by taking a cascade filtering approach,in which the more expensive operations are applied only at locations that pass an initial test.Following are the major stages of computation used to generate the set of image features:
采用瀑布滤波器(cascade filtering卷积滤波器?)可以使提取特征的开销最小化,其中开销最大运算只在定位跟初始化测试时。接下来生成图像特征的主要的几个阶段:

1. Scale-space extrema detection: The first stage of computation searches over all scales and image locations. It is implemented efficiently by using a difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation.

2. Keypoint localization: At each candidate location, a detailed model is fit to determine location and scale. Keypoints are selected based on measures of their stability.

3. Orientation assignment: One or more orientations are assigned to each keypoint location based on local image gradient directions. All future operations are performed on image data that has been transformed relative to the assigned orientation, scale, and location for each feature, thereby providing invariance to these transformations.

4. Keypoint descriptor: The local image gradients are measured at the selected scale in the region around each keypoint. These are transformed into a representation that allows for significant levels of local shape distortion and change in illumination.

This approach has been named the Scale Invariant Feature Transform (SIFT), as it transforms image data into scale-invariant coordinates relative to local eatures.

An important aspect of this approach is that it generates large numbers of features that densely cover the image over the full range of scales and locations. A typical image of size 500x500 pixels will give rise to about 2000 stable features (although this number depends on both image content and choices for various parameters). The quantity of features is particularly important for object recognition, where the ability to detect small objects in cluttered backgrounds requires that at least 3 features be correctly matched from each object for reliable identification.

For image matching and recognition, SIFT features are first extracted from a set of reference images and stored in a database. A new image is matched by individually comparing each feature from the new image to this previous database and finding candidate matching features based on Euclidean distance of their feature vectors. This paper will discuss fast nearest-neighbor algorithms that can perform this computation rapidly against large databases.

The keypoint descriptors are highly distinctive, which allows a single feature to find its correct match with good probability in a large database of features. However, in a cluttered 2 image, many features from the background will not have any correct match in the database, giving rise to many false matches in addition to the correct ones. The correct matches can be filtered from the full set of matches by identifying subsets of keypoints that agree on the object and its location, scale, and orientation in the new image. The probability that several features will agree on these parameters by chance is much lower than the probability that any individual feature match will be in error. The determination of these consistent clusters can be performed rapidly by using an efficient hash table implementation of the generalized Hough transform.

Each cluster of 3 or more features that agree on an object and its pose is then subject to further detailed verification. First, a least-squared estimate is made for an affine approximation to the object pose. Any other image features consistent with this pose are identified,and outliers are discarded. Finally, a detailed computation is made of the probability that a particular set of features indicates the presence of an object, given the accuracy of fit and number of probable false matches. Object matches that pass all these tests can be identified as correct with high confidence .

Distinctive Image Features from Scale-Invariant Keypoints(个人翻译+笔记)-介绍的更多相关文章

  1. Computer Vision_33_SIFT:Distinctive Image Features from Scale-Invariant Keypoints——2004

    此部分是计算机视觉部分,主要侧重在底层特征提取,视频分析,跟踪,目标检测和识别方面等方面.对于自己不太熟悉的领域比如摄像机标定和立体视觉,仅仅列出上google上引用次数比较多的文献.有一些刚刚出版的 ...

  2. Distinctive Image Features from Scale-Invariant

    http://nichol.as/papers/Lowe/Distinctive Image Features from Scale-Invariant.pdf Abstract This paper ...

  3. Distinctive Image Features from Scale-Invariant Keypoints(SIFT) 基于尺度不变关键点的特征描述子——2004年

    Abstract摘要本文提出了一种从图像中提取特征不变性的方法,该方法可用于在对象或场景的不同视图之间进行可靠的匹配(适用场景和任务).这些特征对图像的尺度和旋转不变性,并且在很大范围的仿射失真.3d ...

  4. (转载)Universal Correspondence Network

    转载自:Chris Choy's blog Universal Correspondence Network In this post, we will give a very high-level ...

  5. Computer Vision_18_Image Stitching:Automatic Panoramic Image Stitching using Invariant Features——2007

    此部分是计算机视觉部分,主要侧重在底层特征提取,视频分析,跟踪,目标检测和识别方面等方面.对于自己不太熟悉的领域比如摄像机标定和立体视觉,仅仅列出上google上引用次数比较多的文献.有一些刚刚出版的 ...

  6. Computer Vision_33_SIFT:An Improved RANSAC based on the Scale Variation Homogeneity——2016

    此部分是计算机视觉部分,主要侧重在底层特征提取,视频分析,跟踪,目标检测和识别方面等方面.对于自己不太熟悉的领域比如摄像机标定和立体视觉,仅仅列出上google上引用次数比较多的文献.有一些刚刚出版的 ...

  7. Computer Vision_33_SIFT:LIFT: Learned Invariant Feature Transform——2016

    此部分是计算机视觉部分,主要侧重在底层特征提取,视频分析,跟踪,目标检测和识别方面等方面.对于自己不太熟悉的领域比如摄像机标定和立体视觉,仅仅列出上google上引用次数比较多的文献.有一些刚刚出版的 ...

  8. Computer Vision_33_SIFT:TILDE: A Temporally Invariant Learned DEtector——2014

    此部分是计算机视觉部分,主要侧重在底层特征提取,视频分析,跟踪,目标检测和识别方面等方面.对于自己不太熟悉的领域比如摄像机标定和立体视觉,仅仅列出上google上引用次数比较多的文献.有一些刚刚出版的 ...

  9. Computer Vision_33_SIFT:PCA-SIFT A More Distinctive Representation for Local Image Descriptors——2004

    此部分是计算机视觉部分,主要侧重在底层特征提取,视频分析,跟踪,目标检测和识别方面等方面.对于自己不太熟悉的领域比如摄像机标定和立体视觉,仅仅列出上google上引用次数比较多的文献.有一些刚刚出版的 ...


  1. MySQL多线程复制故障(slave_pending_jobs_size_max)

    MySQL多线程复制故障(slave_pending_jobs_size_max) http://www.xuchanggang.cn/archives/1079.html

  2. 设计模式之笔记--单例模式(Singleton)

    单例模式(Singleton) 定义 单例模式(Singleton),保证一个类仅有一个实例,并提供一个访问它的全局访问点. 类图 描述 类Singleton的构造函数的修饰符为private,防止用 ...

  3. C++ 模板的用法

    C++中的高阶手法就会用到泛型编程,主要有函数模板, 在程序中使用模板的好处就是在定义时不需要指定具体的参数类型,而在使用时确可以匹配其它任意类型, 定义格式如下 template <class ...

  4. POJ-1681

    Painter's Problem Time Limit: 1000MS   Memory Limit: 10000K Total Submissions: 4839   Accepted: 2350 ...

  5. textarea在浏览器中固定大小

    HTML 标签 textarea 在大部分浏览器中只要指定行(rows)和列(cols)属性,就可以规定 textarea 的尺寸,大小就不会改变,不过更好的办法是使用 CSS 的 height 和 ...

  6. hdu 1065(贪心)

    Wooden Sticks Time Limit: 1000MS   Memory Limit: 10000K Total Submissions: 20938   Accepted: 8872 De ...

  7. 模仿jq里的选择器和color样式

    (function(){ HTMLElement.prototype.css = function () { var option; if (arguments.length > 0) { op ...

  8. CentOS7.5安装notepadqq

    这个notepadqq就是linux版本的notepad了 1.添加yum源 sudo wget -O /etc/yum.repos.d/sea-devel.repo http://sea.fedor ...

  9. nginx中使用perl模块

    转载自:http://www.netingcn.com/nginx-perl.html 如果对于一个绝大部分内容是静态的网站,只有极少数的地方需要动态显示,碰巧你又了解一点perl知识,那么nginx ...

  10. 【面试题】整理一下2018年java技术要领

    整理一下2018年java技术要领 基础篇 基本功 面向对象的特征 final, finally, finalize 的区别 int 和 Integer 有什么区别 重载和重写的区别 抽象类和接口有什 ...