Recently Kaggle hosted a competition on the CIFAR-10 dataset. The CIFAR-10 dataset consists of 60k 32x32 colour images in 10 classes. This dataset was collected by Alex
Krizhevsky
Vinod Nair, and Geoffrey Hinton.

Many contestants used convolutional nets to tackle this competition. Some resulted in scores that beat human performance on this classification task. In this blog series we will interview three contestants and also a founding father of convolutional nets: Yann LeCun.

EXAMPLE OF CIFAR-10 DATASET

Yann LeCun

Yann Lecun is currently Director of AI Research at Facebook and a professor at NYU.

Which other scientists should be named and celebrated for the successes of convolutional nets?

Certainly, Kunihiko Fukushima's work on the Neo-Cognitron was an inspiration. Although the early forms of convnets owed little to the Neo-Cognitron, the version we settled on (with pooling layers) did.

A SCHEMATIC DIAGRAM ILLUSTRATING THE INTERCONNECTIONS BETWEEN LAYERS IN THE NEO-COGNITRON. FROM FUKUSHIMA K. (1980) NEOCOGNITRON: A SELF-ORGANIZING NEURAL NETWORK MODEL FOR A MECHANISM OF PATTERN RECOGNITION UNAFFECTED BY SHIFT IN POSITION.

Can you recount an aha!-moment or theoretical breakthrough during your early research into convolutional nets?

Not really. It was the logical thing to do. I had been playing with multi-layer nets with local connections since 1982 or so (not having the right learning algorithm though. Backprop didn't exist). I started experimenting with shared weight nets while I was a postdoc in Toronto in 1988.

The reason for not trying earlier is simply that I didn't have the software nor the data. Once I arrived at Bell Labs, I had access to a large dataset and fast computers (for the time). So I could try full-size convnets, and it worked amazingly well (though it required 2 weeks of training).

What is your opinion on the recent popularity of convolutional nets for object recognition? Did you expect it?

Yes. I knew it had to happen. It was a matter of time until the datasets become large enough and the computers powerful enough for deep learning algorithms to become better than human engineers at designing vision systems.

There was a symposium entitled "frontiers in computer vision" at MIT in August 2011. The title of my talk was “5 years from now, everyone will learn their features (you might as well start now)”. David Lowe (the inventor ofSIFT) said the same thing.

A SLIDE FROM THE TALK LECUN Y. (2011) “5 YEARS FROM NOW, EVERYONE WILL LEARN THEIR FEATURES (YOU MIGHT AS WELL START NOW)”.

Still, I was surprised by how fast the revolution happened and how much better convnets are, compared to other approaches. I would have expected the transition to be more gradual. Also, I would have expected unsupervised learning to play a greater role.

The character recognition model at AT&T was more than a simple classifier, but a complete pipeline. Can you tell more about the implementation problems your team faced?

We had to implement our own program language and write our own compiler to build this. Leon Bottou and I had written a neural net simulator called SN, back in 1987/1988. It was a Lisp interpreter with a numerical library (multidimensional arrays, neural net graphs…). We used this at Bell Labs to develop the first convnets.

Then in the early 90’s, we wanted to use our code in products. Initially, we hired a team of developers to convert our Lisp code to C/C++. But the resulting system could not be improved easily (it wasn’t a good platform for R&D). So Leon, Patrice Simard and I wrote a compiler for SN, which we used to develop the next generation OCR engine.

That system integrated a segmenter, a convnet, and a graphical model on top. The whole thing was trained end to end.

The graphical model was called a “graph transformer network”. It was conceptually similar to what we now call a conditional random field, or a structured perceptron (which it predates), but it allowed for non-linear scoring function (CRF and structured perceptrons can only have linear scoring functions).

The whole infrastructure was written in SN and compiled. This is the system that was deployed in ATM machines and check reading machines in 1996 and was reading 10 to 20% of all the checks in the US by the late 90’s.

AN ANIMATION SHOWING LENET 5 IN ACTION. FROM "INVARIANCE AND MULTIPLE CHARACTERS WITH SDNN (MULTIPLE CHARACTERS DEMO)".

In comparison with other methods, training convnets is pretty slow. How do you deal with the trade-off between experimentation and increased model training times? What does a typical development iteration look like?

In my experience, the best large-scale learning systems always take 2 or 3 weeks to train, regardless of the task, the method, the hardware, or the data.

I don’t know if convnets are “pretty slow”. Compared to what? They may be slow to train, but the alternative to “slow learning” is months of engineering efforts which doesn’t work as well in the end. Also, convnets are actually pretty fast to run (after training).

In a real application, no one really cares how long it takes to train. But people care a lot about how long it takes to run.

Which recent papers on convolutional nets are you most excited about? Any papers or ideas we should look out for?

There are lots and lots of ideas surrounding convnets and deep learning that have lived in relative obscurity for the last 20 years or so. No ones cared about it, and getting papers published was always a struggle. So, lots of ideas were never properly tried, never published, or were tried and published but soundly ignored and quickly forgotten. Who remembers that the first learning-based face detector that actually worked was a convolutional net (back in 1993, eight years before Viola-Jones)?

A FIGURE WITH PREDICTIONS FROM VAILLANT R., MONROCQ C., LECUN Y. (1993) "AN ORIGINAL APPROACH FOR THE LOCALISATION OF OBJECTS IN IMAGES".

Today, it’s really amazing to see so many young and bright people devoting so much creative energy to the topic and coming up with new ideas and new applications. The hardware / software infrastructure is getting better, and it’s becoming possible to train large networks in a few hours or a few days. So people can explore many more ideas that in the past.

One thing I’m excited about is the idea of “spectral convolutional net”. This was a paper at ICLR 2014 by folks from my NYU lab about a generalization of convolutional nets that can be applied to any graphs (regular convnets can be applied to 1D, 2D or 3D arrays that can be seen as regular grids in terms of graph). There are practical issues, but it opens the door to many more applications of convnets to unstructured data.

MNIST DIGITS ON A SPHERE. FROM BRUNA J., ZAREMBA W., SZLAM A., LECUN Y. (2013) "SPECTRAL NETWORKS AND DEEP LOCALLY CONNECTED NETWORKS ON GRAPHS".

I’m very excited about the application of convnets (and recurrent nets) to natural language understanding (following the seminal work of Collobert and Weston).

Since the error rate of a human is estimated to be around 6%, and Dr. Graham showed results of 4.47%, do you consider CIFAR-10 to be a solved problem?

It’s a solved problem in the same sense as MNIST is a solved problem. But frankly, people are more interested inImageNet than in CIFAR-10 nowadays. In that sense, CIFAR-10 is not a “real” problem. But it’s not a bad benchmark for a new algorithm.

What would it take for convnets to see a much wider adoption in the industry? Will training convnets and the software to set them up become less challenging?

What are you talking about? Convnets are absolutely everywhere now (or about to be everywhere) in industry:FacebookGoogleMicrosoftIBMBaiduNECTwitterYahoo!….

That said, it’s true that all of these companies have significant R&D resources and that training convnets can still be challenging for smaller companies or companies that are less technically advanced.

It still requires quite of bit of experience and time investment to train a convnet if you don’t have prior training. Soon however, there will be several simple to use open source packages with efficient back-ends for that.

Are we close to the limit for convnets? Or could CIFAR-100 be "solved" next?

I don’t think it’s a good test. ImageNet is a much better test.

Shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional architectures. Do deep nets really need to be deep?

Yes, deep nets need to be deep. Try to train a shallow net to emulate a deep convnet trained on ImageNet. Come back when you have done that. In theory, a deep net can be approximated by a shallow one. But on complex tasks, the shallow net will have to be be ridiculously large.

Most of your academic work is highly practical in nature. Is this something you purposefully aim for, or is this an artefact of being employed by companies? Can you tell about the distinction between theory and practice?

Hey, I’ve been in academia since 2003, and I’m still a part-time professor at NYU. I do theory when it helps me understand things. Theory often help us understand what’s possible and what’s not possible. It helps suggest proper ways to do things.

But sometimes theory restricts our thinking. Some people will not work with some models because the theory about them is too difficult. But often, a technique works well before the reasons for it working well are fully understood theoretically.

By restricting yourself to work on stuff you fully understand theoretically, you are condemned to using conceptually simple methods.

Also, sometimes theory blinds us. For example, some people were dazzled by kernel methods because of the cute math that goes with it. But, as I’ve said in the past, in the end, kernel machines are shallow networks that perform “glorified template matching”. There is nothing wrong with that (SVM is a great method), but it has dire limitations that we should all be aware of.

A SLIDE FROM LECUN Y. (2013) LEARNING HIERARCHIES FROM INVARIANT FEATURES

What is your opinion on a well-performing convnet without any theoretical justifications for why it should work so well? Do you generally favor performance over theory? Where do you place the balance?

I don’t think their is a choice to make between performance and theory. If there is performance, there will be theory to explain it.

Also, what kind of theory are we talking about? Is it a generalization bound? Convnets have a finite VC dimension, hence they are consistent and admit the classical VC bounds. What more do you want? Do you want a tighter bound, like what you get for SVMs? No theoretical bound that I know of is tight enough to be useful in practice. So I really don’t understand the point. Sure, generic VC bounds are atrociously non tight, but non-generic bounds (like for SVMs) are only slightly less atrociously non tight.

If what you desire are convergence proofs (or guarantees), that’s a little more complicated. The loss function of multi-layer nets is non convex, so the easy proofs that assume convexity are out the window. But we all know that in practice, a convnet will almost always converge to the same level of performance, regardless of the starting point (if the initialization is done properly). There is theoretical evidence that there are lots and lots of equivalent local minima and a very small number of “bad” local minima. Hence convergence is rarely a problem.

What is your opinion on AI hype. Which practices do you think are detrimental to the field (of AI in general and specifically convnets)?

AI hype is extremely dangerous. It killed various approaches to AI at least 4 times in the past. I keep calling out hype whenever I see it, whether it’s from the press, from startups looking for investors, from large companies looking for PR, or from academics looking for grants.

There is certainly quite a bit of hype around deep learning at the moment. I don’t see a particularly high level of hype around convnets specifically. There is more hype around “cortical this”, “spiking that”, and “neuromorphic blah”. Unlike many of these things, convnets actually yield good results on useful tasks and are widely deployed in industrial applications.

Any interesting projects at Facebook involving convnets that you could talk a little more about? Some basic stats about the size?

DeepFace: a convnet for face recognition. There are also convnets for image tagging. They are big.

A FIGURE DESCRIBING THE ARCHITECTURE FROM THE PRESENTATION "TAIGMAN Y., YANG M., RANZATO M., WOLF L. (2014) DEEPFACE FOR UNCONSTRAINED FACE RECOGNITION".

Recently you posted about 4 types of serious researchers. How would you label yourself?

I’m a 3, with a bit of 1 and 4.

  1. "People who want to explain/understand learning (and perhaps intelligence) at the fundamental/theoretical level.
  2. People who want to solve practical problems and have no interest in neuroscience.
  3. People who want to understand intelligence, build intelligent machines, and have a side interest in understanding how the brain works.
  4. People whose primary interest is to understand how the brain works, but feel they need to build computer models that actually work in order to do so."

Anything you wish to say to the top contestants in the CIFAR-10 challenge? Anything you wish to say to (hobbyist) researchers studying convnets? Anything in general you wish to say about the CIFAR dataset/problem?

I’m impressed by how much creativity and engineering knack went into this. It’s nice that people have pushed the technique as far as it will go on this dataset.

But it’s going to get easier and easier for independent researchers and hobbyist to play with these things and apply them to larger datasets. I think the successor to CIFAR-10 should be ImageNet-1K-128x128. This would be a version of the 1000 category ImageNet classification task where the images have been normalized to 128x128. I see several advantages:

  1. the networks are small enough to be trainable in a reasonable amount of time on a high-end gamer rig;
  2. the network you get at the end can actually be used for useful application (like robot vision);
  3. the network can be run in real time on embedded platforms, like smart phones or the NVIDIA Jetson TK1.

PREDICTIONS ON IMAGENET. FROM "KRIZHEVSKY A., SUTSKEVER I., HINTON. G.E. (2012) IMAGENET CLASSIFICATION WITH DEEP CONVOLUTIONAL NEURAL NETWORKS".

The need to have large amounts of labeled data can be a problem. What is your opinion on nets trained on unlabeled data, or the automatic labeling of data through image search engines?

There are tasks like video understanding and natural language understanding where we are going to have to use unsupervised learning. But these modalities have a temporal dimension that changes how we can approach the problem.

Clearly, we need to devise algorithms that can learn the structure of the perceptual world without being told the name of everything. Many of us have been working on this for years (if not decades), but none of us has a perfect solution.

What is your latest research focusing on?

There are two answers to this question:

  1. Projects I’m personally involved in (enough that I would be co-author on the papers);
  2. projects that I set the stage for, encourage other work on, and advise at the conceptual level, but in which I am not involved enough to be co-author on a paper.

A lot of (1) is at NYU and a lot of (2) is at Facebook.

The general areas are:

unsupervised learning that discovers “invariant” features, the marriage of deep learning and structured prediction, the unification of supervised and unsupervised learning, solving the problem of learning long-term dependencies, building learning systems with short-term/scratchpad memory, learning plans and sequences of actions, different ways to optimize functions than to follow the gradient, the integration of representation learning with reasoning (read Leon Bottou’s excellent position paper “from machine learning to machine reasoning”), the use of learning to perform inference efficiently, and many other topics.

from: http://blog.kaggle.com/2014/12/22/convolutional-nets-and-cifar-10-an-interview-with-yan-lecun/

卷积神经网络和CIFAR-10:Yann LeCun专访 Convolutional Nets and CIFAR-10: An Interview with Yann LeCun的更多相关文章

  1. 卷积神经网络图像纹理合成 Texture Synthesis Using Convolutional Neural Networks

    代码实现 概述 这是关于Texture Synthesis Using Convolutional Neural Networks论文的tensorflow2.0代码实现,使用keras预训练的VGG ...

  2. 人脸检测及识别python实现系列(4)——卷积神经网络(CNN)入门

    人脸检测及识别python实现系列(4)——卷积神经网络(CNN)入门 上篇博文我们准备好了2000张训练数据,接下来的几节我们将详细讲述如何利用这些数据训练我们的识别模型.前面说过,原博文给出的训练 ...

  3. 图卷积神经网络(GCN)入门

    图卷积网络Graph Convolutional Nueral Network,简称GCN,最近两年大热,取得不少进展.不得不专门为GCN开一个新篇章,表示其重要程度.本文结合大量参考文献,从理论到实 ...

  4. 【翻译】TensorFlow卷积神经网络识别CIFAR 10Convolutional Neural Network (CNN)| CIFAR 10 TensorFlow

    原网址:https://data-flair.training/blogs/cnn-tensorflow-cifar-10/ by DataFlair Team · Published May 21, ...

  5. 【原创】梵高油画用深度卷积神经网络迭代10万次是什么效果? A neural style of convolutional neural networks

    作为一个脱离了低级趣味的码农,春节假期闲来无事,决定做一些有意思的事情打发时间,碰巧看到这篇论文: A neural style of convolutional neural networks,译作 ...

  6. 深度学习FPGA实现基础知识10(Deep Learning(深度学习)卷积神经网络(Convolutional Neural Network,CNN))

    需求说明:深度学习FPGA实现知识储备 来自:http://blog.csdn.net/stdcoutzyx/article/details/41596663 说明:图文并茂,言简意赅. 自今年七月份 ...

  7. [DeeplearningAI笔记]卷积神经网络3.10候选区域region proposals与R-CNN

    4.3目标检测 觉得有用的话,欢迎一起讨论相互学习~Follow Me 3.10 region proposals候选区域与R-CNN 基于滑动窗口的目标检测算法将原始图片分割成小的样本图片,并传入分 ...

  8. [DeeplearningAI笔记]卷积神经网络4.6-4.10神经网络风格迁移

    4.4特殊应用:人脸识别和神经网络风格转换 觉得有用的话,欢迎一起讨论相互学习~Follow Me 4.6什么是神经网络风格转换neural style transfer 将原图片作为内容图片Cont ...

  9. Python机器学习笔记:卷积神经网络最终笔记

    这已经是我的第四篇博客学习卷积神经网络了.之前的文章分别是: 1,Keras深度学习之卷积神经网络(CNN),这是开始学习Keras,了解到CNN,其实不懂的还是有点多,当然第一次笔记主要是给自己心中 ...

随机推荐

  1. C#开源大全--汇总(转)

    商业协作和项目管理平台-TeamLab 网络视频会议软件-VMukti 驰骋工作流程引擎-ccflow [免费]正则表达式测试工具-Regex-Tester Windows-Phone-7-SDK E ...

  2. 揭开NodeJS的神秘面纱!

    一.NodeJS是什么? Node是一个服务器端JavaScript解释器.Node.js是一套用来编写高性能网络服务器的JavaScript包. 二.Node的目标是什么? Node 公开宣称的目标 ...

  3. 【Python】vim7.4 配置python2.6支持Gundo

    问题描述:          vim7.4 配置python2.6支持Gundo   参考资料:         (1)  http://sjl.bitbucket.org/gundo.vim/    ...

  4. hdu 1075 What Are You Talking About

    题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=1075 题意:比较简单,易懂,这里不做说明. 解法:第一种方法:用map映射,耗时1000+ms:第二种 ...

  5. 【BZOJ】【2049】【SDOI2008】洞穴勘测 Cave

    LCT 哦……LCT的一道更水的裸题,适合学习access,link,cut等基本操作(其实这三个不是在一个层面上的?不要在意这些细节……) /**************************** ...

  6. 监听HTML input输入框值的即时变化onpropertychange、oninput兼容IE,Chrome,FF,Opera等

    转自:http://blog.csdn.net/itchiang/article/details/7769337 要达到的效果 很多情况下我们都会即时监听输入框值的变化,以便作出即时动作去引导浏览者增 ...

  7. 【转载】一淘技术专家王晓哲:Nginx_lua的测试及选择

    对于Web高性能服务器上的选择,这个是很多人头痛的问题.其实Apache.lighttpd.Nginx都用他们优点,在什么情况下我们如何去选择适合自己的Web高性能服务器,如何去搭建一个适合自己的架构 ...

  8. Linux软件安装方法小结(附:rpm详解)(转载)

    在使用Linux系统的过程中,软件包的安装是避免不了的,在Linux下,软件安装程序的种类很多,安装方法也各式各样,(舒适性自然比不上windows :-))不过我们常见的软件包有两种: 1)含有软件 ...

  9. Intent.ACTION广播大全

    Intent.ACTION广播大全 Intent.ACTION_AIRPLANE_MODE_CHANGED; //关闭或打开飞行模式时的广播 Intent.ACTION_BATTERY_CHANGED ...

  10. CSLight研究院之学习笔记结合NGUI(一)

    原地址:http://www.xuanyusong.com/archives/3088 这两天一直在研究CSLight,目前Unity热更新的方式有两种,一种是ulua这个网上的例子已经很多了.还有一 ...