Deep learning for visual understanding: A review

视觉理解中的深度学习:回顾

ABSTRACT: Deep learning algorithms are a subset of the machine learning algorithms, which aim at discovering multiple levels of distributed representations. Recently, numerous deep learning algorithms have been proposed to solve traditional artificial intelligence problems. This work aims to review the state-of-the-art in deep learning algorithms in computer vision by highlighting the contributions and challenges from over 210 recent research papers. It first gives an overview of various deep learning approaches and their recent developments, and then briefly describes their applications in diverse vision tasks, such as image classification, object detection, image retrieval, semantic segmentation and human pose estimation. Finally, the paper summarizes the future trends and challenges in designing and training deep neural networks.

摘要:深度学习算法是机器学习算法的一类,其目标是发现多层的分布式表达。最近,提出了众多的深度学习算法来解决传统人工智能问题。本文目标是从210多篇近期的研究论文中通过突出其贡献和挑战来综述最新的机器视觉中的深度学习算法。本文首先给出了众多深度学习算法的概览及其最新进展,然后简要的叙述了在不同视觉任务中的应用,比如图像分类、目标检测、图像检索、语义分割和人体姿势估计。最后,论文总结了设计和训练深度神经网络在未来的趋势和挑战。

1. Introduction  介绍

Deep learning is a subfield of machine learning which attempts to learn high-level abstractions in data by utilizing hierarchical architectures. It is an emerging approach and has been widely applied in traditional artificial intelligence domains, such as semantic parsing [1], transfer learning [2,3], natural language processing [4], computer vision [5,6] and many more. There are mainly three important reasons for the booming of deep learning today: the dramatically increased chip processing abilities (e.g. GPU units), the significantly lowered cost of computing hardware, and the considerable advances in the machine learning algorithms [9].

深度学习是机器学习领域中的一类算法,目标是采用分层的结构来学习数据中的高层抽象特征。这是一种新兴的方法,已经被广泛的应用在传统的人工智能领域,比如语义解析[1],转移学习[2,3],自然语言处理[4],计算机视觉[5,6]以及其他很多方面。有三个主要的原因导致了今天深度学习的蓬勃发展:芯片计算能力的巨大提升(比如GPU单元),计算硬件价格的显著降低,和机器学习算法中的重要进步[9]。

Various deep learning approaches have been extensively reviewed and discussed in recent years [8–12]. Among those Schmidhuber et al. [10] emphasized the important inspirations and technical contributions in a historical timeline format, while Bengio [11] examined the challenges of deep learning research and proposed a few forward-looking research directions. Deep networks have been shown to be successful for computer vision tasks because they can extract appropriate features while jointly performing discrimination [9,13]. In recent ImageNet Large Scale Visual Recognition Challenge (ILSVRC) competitions [189], deep learning methods have been widely adopted by different researchers and achieved top accuracy scores [7].

近年来广泛讨论和回顾过很多深度学习方法[8-12]。其中Schmidhuber等[10]以历史时间线的顺序强调了重要的灵感和技术贡献,而Bengio[11]列举了深度学习研究的挑战并提出了几个以后的研究方向。深度网络已被证明成功应用在计算机视觉任务中,因为它们在提取适当特征的同时进行了区分[9,13]。在最新的ImageNet挑战赛中[189],深度学习方法已被不同的研究者广泛采用,得到了最高的准确率得分[7]。

This survey is intended to be useful to general neural computing, computer vision and multimedia researchers who are interested in the state-of-the-art in deep learning in computer vision. It provides an overview of various deep learning algorithms and their applications, especially those that can be applied in the computer vision domain.

本文的调查旨在为一般的神经计算、计算机视觉和多媒体研究者中对深度学习在计算机视觉中的最新应用感兴趣的人提供帮助。本文提供了众多深度学习算法及其应用的概览,尤其是那些可以应用在计算机视觉领域中的算法。

The remainder of this paper is organized as follows:

In Section 2, we divide the deep learning algorithms into four categories: Convolutional Neural Networks, Restricted Boltzmann Machines, Autoencoder and Sparse Coding. Some well-known models in these categories as well as their developments are listed. We also describe the contributions and limitations for these models in this section. In Section 3, we describe the achievements of deep learning schemes in various computer vision applications, i.e. image classification, object detection, image retrieval, semantic segmentation and human pose estimation. The results on these applications are shown and compared in the pipeline of their commonly used datasets. In Section 4, along with the success deep learning methods have achieved, we also face several challenges when designing and training the deep networks. In this section, we summarize some major challenges for deep learning, together with the inherent trends that might be developed in the future. In Section 5, we conclude the paper.

本文组织如下:

在第二部分中,我们将深度学习算法分成四类:卷积神经网络,受限玻尔兹曼机,自动编码机和稀疏编码。列出了这些类别中一些著名的模型以及其发展。在这部分中我们还讨论了这些模型的贡献和局限性。在第三部分中,我们描述了深度学习算法在众多计算机视觉应用中的成就,包括,图像分类、目标检测、图像检索、语义分割和人类姿势估计。针对通用的数据集,展示了这些应用的结果并进行了对比。在第四部分,随着深度学习算法取得的成功,我们在设计和训练深度网络时还面临着几个挑战。在这部分中,我们总结了深度学习面临的几个主要挑战,以及将来发展过程中的内在趋势。在第五部分中,我们对本文进行了总结。

Deep learning for visual understanding: A review 视觉理解中的深度学习:回顾 之一的更多相关文章

  1. Deep learning for visual understanding: A review

    https://www.sciencedirect.com/science/article/pii/S0924271618301291?dgcid=raven_sd_recommender_email ...

  2. 论文阅读:Face Recognition: From Traditional to Deep Learning Methods 《人脸识别综述:从传统方法到深度学习》

     论文阅读:Face Recognition: From Traditional to Deep Learning Methods  <人脸识别综述:从传统方法到深度学习>     一.引 ...

  3. 【RS】Deep Learning based Recommender System: A Survey and New Perspectives - 基于深度学习的推荐系统:调查与新视角

    [论文标题]Deep Learning based Recommender System: A Survey and New Perspectives ( ACM Computing Surveys  ...

  4. Deep Learning模型之:CNN卷积神经网络(一)深度解析CNN

    http://m.blog.csdn.net/blog/wu010555688/24487301 本文整理了网上几位大牛的博客,详细地讲解了CNN的基础结构与核心思想,欢迎交流. [1]Deep le ...

  5. Deep learning:四十六(DropConnect简单理解)

    和maxout(maxout简单理解)一样,DropConnect也是在ICML2013上发表的,同样也是为了提高Deep Network的泛化能力的,两者都号称是对Dropout(Dropout简单 ...

  6. Deep learning:四十九(RNN-RBM简单理解)

    前言: 本文主要是bengio的deep learning tutorial教程主页中最后一个sample:rnn-rbm in polyphonic music. 即用RNN-RBM来model复调 ...

  7. 吴恩达《深度学习》-第二门课 (Improving Deep Neural Networks:Hyperparameter tuning, Regularization and Optimization)-第一周:深度学习的实践层面 (Practical aspects of Deep Learning) -课程笔记

    第一周:深度学习的实践层面 (Practical aspects of Deep Learning) 1.1 训练,验证,测试集(Train / Dev / Test sets) 创建新应用的过程中, ...

  8. 视觉SLAM中的深度估计问题

    一.研究背景 视觉SLAM需要获取世界坐标系中点的深度. 世界坐标系到像素坐标系的转换为(深度即Z): 深度的获取一共分两种方式: a)主动式 RGB-D相机按照原理又分为结构光测距.ToF相机 To ...

  9. Deep learning:五十(Deconvolution Network简单理解)

    深度网络结构是由多个单层网络叠加而成的,而常见的单层网络按照编码解码情况可以分为下面3类: 既有encoder部分也有decoder部分:比如常见的RBM系列(由RBM可构成的DBM, DBN等),a ...

随机推荐

  1. Geeks : Kruskal’s Minimum Spanning Tree Algorithm 最小生成树

    版权声明:本文作者靖心,靖空间地址:http://blog.csdn.net/kenden23/.未经本作者同意不得转载. https://blog.csdn.net/kenden23/article ...

  2. 使用jenkins管理uirecorder录制的任务

    在uirecorder官网(http://uirecorder.com/)上,对jenkins的配置只有简单的几句话: How to dock Jenkins? Add commands source ...

  3. centos下mysqlreport安装和使用

    首先查看你的机器是否安装了perl: #perl -v 显示版本号即表示已安装 然后: #yum install perl-DBD-mysql perl-DBI #yum install mysqlr ...

  4. c++问题整理

    1.C++ 多态,多态的实现,c++虚函数,虚函数和纯虚函数有什么区别,虚函数的实现原理,虚继承,析构函数能否为虚,为什么析构要虚函数,析构函数声明为虚函数的作用,构造函数为啥不能定义为虚函数,析构函 ...

  5. 404 Note Found 队-Beta5

    目录 组员情况 组员1(组长):胡绪佩 组员2:胡青元 组员3:庄卉 组员4:家灿 组员5:恺琳 组员6:翟丹丹 组员7:何家伟 组员8:政演 组员9:黄鸿杰 组员10:刘一好 组员11:何宇恒 展示 ...

  6. Flask租房项目总结

    该Flask项目历时3天,开发小组6人,目的是开发一个租房web项目,该项目采用前后端分离模式. Flask租房项目总结 分析需求文档,需要完成的功能模块有: 登陆注册 首页展示,首页搜索 详情展示, ...

  7. 关于java8(Stream)的一些用法

    如果要处理int[] 转换成 List<Integer>这种形式的,可以用下面这个方法: List<Integer> orgIds = Arrays.stream(reqVo. ...

  8. html5的文档申明为什么是<!DOCTYPE html>?

    首先我们来了解一下什么是文档声明: 文档声明就是文档告诉游览器该以什么样的标准去解析它.游览器可以解析的文档可不止html,还有xhtml,xml...当然在这里我们并不需要知道xhtml.xml是什 ...

  9. PostMan测试接口,绕过登录验证

    之前测试的时候,需要页面进行登录之后,才能让访问后台程序,但是在进行接口测试的时候,没有验证就一直登录不进去,之后参考了一篇文章.解决如下. 1.在浏览器上先登录,登录成功后获取cookie: 记住J ...

  10. iOS开发者证书-详解

    iOS开发者证书-详解/生成/使用 本文假设你已经有一些基本的Xcode开发经验, 并注册了iOS开发者账号. 相关基础 加密算法 现代密码学中, 主要有两种加密算法: 对称密钥加密 和 公开密钥加密 ...