CIFAR-10 Competition Winners: Interviews with Dr. Ben Graham, Phil Culliton, & Zygmunt Zając
CIFAR-10 Competition Winners: Interviews with Dr. Ben Graham, Phil Culliton, & Zygmunt Zając
Dr. Ben Graham
Dr. Ben Graham is an Assistant Professor in Statistics and Complexity at the University of Warwick. With a categorization accuracy of 0.95530 he ranked first place.
Congratulations on winning the CIFAR-10 competition! How do you feel about your victory?
Thank you! I am very pleased to have won, and quite frankly pretty amazed at just how competitive the competition was.
When I first saw the competition, I did not think the test error would go below about 8%. I assumed 32x32 pixels just wasn't enough information to identify objects very reliably. As it turned out, everyone in the top 10 got below 7%, which is roughly on a par with human performance.
Can you tell us about the setup of the network? How many layers?
It is a deep convolutional network trained using SparseConvNet with architecture:
input=(3x126x126) -
320C2 - 320C2 - MP2 -
640C2 - 10% dropout - 640C2 - 10% dropout - MP2 -
960C2 - 20% dropout - 960C2 - 20% dropout - MP2 -
1280C2 - 30% dropout - 1280C2 - 30% dropout - MP2 -
1600C2 - 40% dropout - 1600C2 - 40% dropout - MP2 -
1920C2 - 50% dropout - 1920C1 - 50% dropout - 10C1 - Softmax output
It was trained taking advantage of:
- spatial-sparsity in the 126x126 input layer,
- batchwise dropout,
- (very) leaky rectified linear units, and
- affine spatial and color-space training data augmentation.
The same architecture produces a test error of 20.68% for CIFAR-100.
These cats evaded the DeepCNet solution by looking a lot like a fighter jet and a car.
Can you tell us a little about the hardware used to train the nets? How long did it take to train? What was the development cycle like?
The network took about 90 hours to train on an NVIDIA GeForce GTX780 graphics card. I had already written a convolutional neural network for spatially-sparse inputs to learn to recognise online Chinese handwriting.
Over the course of the competition I upgraded the program to allowdropout to be applied batchwise, and cleaned up some kernels that were accessing memory inefficiently. That made it feasible to train pretty large networks.
Which papers/approaches authored by other scientists did contribute the most to your top score?
The network architecture is the result of borrowing ideas from a number of recent papers
- Multi-column deep neural networks for image classification; Ciresan, Meier and Schmidhuber
- Network In Network; Lin, Chen and Yan
- Very Deep Convolutional Networks for Large-Scale Image Recognition; Simonyan and Zisserman.
Reading each of those papers was jaw-dropping as the ideas would not have occurred to me.
These images were all correctly classified. To the net they look the most like their respective classes. From DeepCNet's extremes.
Where do you see convnets in the future? Anything in particular that you are excited about?
I am very interested in the idea of spatially-sparse 3d convolutional networks. For example, given a length of string, you might be able to pull both ends to produce a straight line. Alternatively, the string might contain a knot which you cannot get rid of no matter how hard you pull. That is an idea that is obvious for humans, but hard to solve by computer as there are so many different kinds on knots.
Hopefully 3d convolutional networks can develop some of the physical intuition humans take for granted.
Besides convnets, I am very interested in machine learning techniques for time-series data, such as recurrent neural networks.
Thank you very much for sharing your code on the forums. What is your opinion on sharing code?
My pleasure; it was nice to see a couple of the other teams in the top 10 ("Jiki" and "Phil & Triskelion & Kazanova") use the code. Another Kaggler, Nagadomi, also made his code available during the competition. It was fascinating to see him implement some of the ideas to come out of the ILSVRC2014 competition such as "C3-C3-MP2" layers and Inception layers.
Do you think your convnet could be improved even more on this task, or do you feel it is close to its limit?
After the competition, I re-ran my top network on the 10,000 images from the original CIFAR-10 test set, resulting in 446 errors.
Here is a confusion matrix for showing where the 446 errors come from:
airplane automobile bird cat deer dog frog horse ship truck
airplane 0 3 10 2 2 0 2 0 16 3
automobile 1 0 1 0 0 0 0 0 3 12
bird 8 1 0 14 19 8 9 5 2 0
cat 4 1 8 0 9 57 20 2 5 2
deer 3 1 12 7 0 5 4 8 0 0
dog 4 1 7 39 10 0 1 7 1 1
frog 4 0 7 7 3 1 0 1 0 1
horse 6 0 3 4 7 8 0 0 0 0
ship 2 3 2 0 0 1 0 0 0 3
truck 3 20 0 2 0 0 1 0 7 0
Looking at some of the 446 misclassified images, it seems that there is plenty of room for improvement in accuracy. I am sure there is also scope for improving the efficiency of the network.
Which machine learning scientist inspires you?
Lots of them: Alan Turing, Yann LeCun, Geoffrey Hinton, Andrew Ng,Jürgen Schmidhuber, Yoshua Bengio, Rob Fergus, Alex Krizhevsky, Ilya Sutskever, ...
Anything of note on the competition and/or dataset that you found surprising? An approach that worked unexpectedly well, or perhaps did not work for you?
I was very surprised how much of a difference fine-tuning (finishing off training the network using a small number of training epochs with a low learning rate and without data augmentation) made.
Again, thank you very much for sharing your code. Our team would not have beaten the estimated human error rate without it!
My pleasure. Academia can be a bit antisocial, so it is lovely to see so much enthusiasm going into Kaggle competitions.
Phil Culliton
Phil Culliton is a game developer and Senior Researcher at an NLP startup. With a score of 0.94120 his team scored 6th place.
Can you tell about the architecture of the net? Number of layers etc.?
Our 6th place submission used multiple iterations (with varying epoch counts) of a single network architecture.
We also used a "trick" suggested by Dr. Graham which incorporated a small number of epochs that used no affine transformations.
The network architecture in question was Dr. Graham's spatially sparse CNN. It used 12 LeNet layers and a final softmax layer - it looked roughly like this (this is modified output from Dr. Graham's code):
LeNetLayer 128 neurons, VeryLeakyReLU
LeNetLayer 128 neurons, VeryLeakyReLU MP2
LeNetLayer 384 neurons, Dropout 0.0833333 VeryLeakyReLU
LeNetLayer 384 neurons, VeryLeakyReLU MP2
LeNetLayer 768 neurons, Dropout 0.208333 VeryLeakyReLU
LeNetLayer 768 neurons, VeryLeakyReLU MP2
LeNetLayer 1280 neurons, Dropout 0.3 VeryLeakyReLU
LeNetLayer 1280 neurons, VeryLeakyReLU MP2
LeNetLayer 1920 neurons, Dropout 0.4 VeryLeakyReLU
LeNetLayer 1920 neurons, VeryLeakyReLU MP2
LeNetLayer 2688 neurons, Dropout 0.5 VeryLeakyReLU
LeNetLayer 2688 neurons, VeryLeakyReLU
LeNetLayer 10 neurons, Softmax Classification
The "MP" entries above denote max pooling, and "VeryLeakyReLU" denotes a "leaky" ReLU with a fairly large (alpha was 0.33) non-zero gradient.
DropOut was implemented in a straightforward manner. I considered adding DropConnect into the mix but ran out of time to test it.
Input images were distorted using a semi-random system of stretching and flipping - I played around with this but also ran out of time to properly validate it.
Earlier in the competition I did attempt to ensemble multiple network architectures but none of them outperformed the top contender.
The cat on the left looks most like a frog. The cats on the right trick the net into thinking it is a boat. From DeepCNet's extreme errors.
What were the technical challenges to overcome to produce submissions for this challenge?
I mention this again later, but getting CUDA installed and running properly on various machines turned out to be a much bigger task than I thought it would be - it was difficult and time-consuming. I'm an old hand at getting cranky C code to compile - like, say, porting Windows codebases to OSX - so when I say I saw some weird stuff in trying to get CUDA-based libraries to run, I mean it.
Also - in a normal Kaggle competition I try to make use of all of the submissions available to me, even if it's just to try oddball approaches that may or may not work. However, for CIFAR-10, coming up with the machine time was an issue. I farmed the work out over AWS GPU servers as well as multiple local servers, but AWS quickly became expensive and eventually I had to stop using it.
Finding the right ratio of network size / sample batch size / speed for each server also took some care. I discovered that sample batch size (the number of samples sent to the GPU at a time) actually had an effect on final results, although I haven't yet quantified it. I'd be interested in exploring that further.
Which libraries did you use? Can you give some of the pros and cons?
For the top submission's neural networks we used Dr. Graham's reference code in CUDA / C++, with variations in parameters and some extremely minor changes.
The biggest pro was speed - we were training simply enormous networks and it could only have worked using GPGPUs. The cons - complexity of setup and installation. Each machine's CUDA install was a new mini-adventure, some of which didn't turn out so well. I hadn't played with CUDA much before, and frankly I'm not too enamored with it. Getting it working properly - and compiling OpenCV with it! - on OSX was ridiculously hard. Eventually I switched over to all-Linux CUDA servers, where the task was marginally easier.
Luckily Dr. Graham's code was very adaptable and didn't have any strange library requirements - several of the other libraries we attempted to use required very specific / old versions of CUDA and would only work if you had a particular compiler, etc., or weren't amenable to running on one platform or another.
Did you try anything else besides convolutional nets?
I also tried simple neural networks using H2O in R, kNN with scikit-learn, and Vowpal Wabbit. I'm a pretty heavy user of the latter two, but H2O was new to me. All produced interesting results, but none ground-breaking.
I did really like H2O's deep learning implementation in R, though - the interface was great, the back end extremely easy to understand, and it was scaleable and flexible. Definitely a tool I'll be going back to.
Did you read any papers for this competition?
Several. DropOut, DropConnect, and network architecture papers were heavily featured. I had just been doing some NN work in my day job so I got some dual-purpose reading done.
I heartily recommend Dr. Graham's preprint about the architecture we used - you can find it on his website.
I spent a fair bit of time on fastml.com as well - their articles on CIFAR-10 were beyond useful.
The DeepCNet architecture from "Graham B. (2014) Spatially-sparse convolutional neural networks"
What did you learn from this competition? First time using convnets for a Kaggle competition? Do you think you can apply any knowledge to future competitions?
This was my first time using convnets for anything! I was impressed with their power and accuracy. I was also impressed at the number of GPU hours (and expense) it took to run a decent-sized network. It certainly isn't for the impatient or faint of heart.
I strongly suspect that deep learning / NNs will bubble toward the top of my toolbox for some problems. Definitely on anything remotely similar to CIFAR I'll be headed to the code from this competition first - probably with an email to Dr. Graham shortly thereafter.
I heard you approached Dr. Ben Graham midway during the competition and he released code. Can you tell a little about how this came to be?
Sure! I noticed that Dr. Graham was consistently on the top of the leaderboard and clicked through his Kaggle profile to find out if he was working for an ML company or using a particular product. There wasn't anything on his profile except for a link that was only partially visible, so I hopped on Google and dug around a bit.
It was a slightly convoluted process, but I eventually made my way tohis website and noted that he had several sets of sample / reference code for dealing with CIFAR-10 that were freely available and accompanied by (rather excellent) write-ups. I grabbed a set and started trying to work with it, had some problems getting it going, and sent him a question. I figured I wouldn't hear back from him - frankly I wasn't sure whether he'd be willing to help his competition.
However, within a few hours he'd sent me a version of the code with all the issues ironed out and some friendly comments! Shortly thereafter he shared that same code on the forums, which was great as that got even more people using it.
We kept in touch during the competition, whenever he updated the code on the forums he'd send me an email letting me know, and encouraging me to keep trying (although by the end of the competition it was pretty clear to me that he was going to win).
He was a great sport, a tremendous help and I'm looking forward to seeing more of his work in the future.
Zygmunt Zając
Zygmunt Zając is the author of FastML and a Machine Learning Researcher. He used DropConnect to improve his accuracy to 0.90660, good for 18th place.
Can you tell about the architecture of the net? Number of layers etc.?
I have used models trained by Li Wan, the author of DropConnect. The details are outlined in the paper: Regularization of Neural Networks using DropConnect.
Figure from the paper: "Wan L., Zeiler M., Zhang S., LeCun Y., Fergus R. (2013) Regularization of Neural Networks using DropConnect
What were the technical challenges to overcome to produce submissions for this challenge?
The challenges were getting the data in and the predictions out, as usual. In this case it meant converting raw images into cuda-convnet format and learning how to get the predictions from the library.
On top of that, getting DropConnect code to work was a bit tricky. You can read about the journey here:
- Object recognition in images with cuda-convnet
- Regularizing neural networks with dropout and with dropconnect
Which libraries did you use? Can you give some of the pros and cons?
I used Alex Krizhevsky’s cuda-convnet extended with Li Wan's code. Cuda-convnet struck me as a very well designed and implemented library.
Did you try anything else besides convolutional nets?
No.
Did you read any papers for this competition?
Mainly Hinton's et al. dropout paper and Li Wan's et al. DropConnect paper. There are other references in the FastML articles mentioned above.
What did you learn from this competition? Did any knowledge from previous competitions (cats vs. dogs) transfer? Do you think you can apply any knowledge to future competitions?
It was my first brush with convolutional networks, I gained a general idea of how they work. Also that it isn't as easy to overfit as I thought.
CIFAR-10 Competition Winners: Interviews with Dr. Ben Graham, Phil Culliton, & Zygmunt Zając的更多相关文章
- 【翻译】TensorFlow卷积神经网络识别CIFAR 10Convolutional Neural Network (CNN)| CIFAR 10 TensorFlow
原网址:https://data-flair.training/blogs/cnn-tensorflow-cifar-10/ by DataFlair Team · Published May 21, ...
- Diabetic Retinopathy Winner's Interview: 1st place, Ben Graham
Diabetic Retinopathy Winner's Interview: 1st place, Ben Graham Ben Graham finished at the top of the ...
- 【神经网络与深度学习】基于Windows+Caffe的Minst和CIFAR—10训练过程说明
Minst训练 我的路径:G:\Caffe\Caffe For Windows\examples\mnist 对于新手来说,初步完成环境的配置后,一脸茫然.不知如何跑Demo,有么有!那么接下来的教 ...
- DL Practice:Cifar 10分类
Step 1:数据加载和处理 一般使用深度学习框架会经过下面几个流程: 模型定义(包括损失函数的选择)——>数据处理和加载——>训练(可能包括训练过程可视化)——>测试 所以自己写代 ...
- [转载]iOS 10 UserNotifications 框架解析
活久见的重构 - iOS 10 UserNotifications 框架解析 TL;DR iOS 10 中以前杂乱的和通知相关的 API 都被统一了,现在开发者可以使用独立的 UserNotifica ...
- LVS DR模式搭建、keepalived+lvs
1.LVS DR模式搭建 条件: 即三台机器,在同一内网. 编辑脚本文件:/usr/local/sbin/lvs_dr.sh #! /bin/bashecho 1 > /proc/sys/net ...
- LVS实现(VS/DR)负载均衡和Keepalived高可用
LVS是Linux Virtual Server的简写即Linux虚拟服务器,是一个虚拟的服务器集群系统一组服务器通过高速的局域网或者地理分布的广域网相互连接,在它们的前端有一个负载调度器(Load ...
- What are some good books/papers for learning deep learning?
What's the most effective way to get started with deep learning? 29 Answers Yoshua Bengio, ...
- 大规模视觉识别挑战赛ILSVRC2015各团队结果和方法 Large Scale Visual Recognition Challenge 2015
Large Scale Visual Recognition Challenge 2015 (ILSVRC2015) Legend: Yellow background = winner in thi ...
随机推荐
- 如何用Ajax实现地址栏省市级联动(数据库表数据源)
HTML: <tr> <th> <label for="textfield"><span class="red"> ...
- asp.net导出excel示例代码
asp.net导出excel的简单方法. excel的操作,最常用的就是导出和导入. 本例使用NPOI实现. 代码:/// <summary> ); ; ...
- Windows 上如何安装Sqlite(转载)
1.获得命令行程序 SQLite命令行程序(CLP)是开始使用SQLite的最好选择,按照如下步骤获取CLP: 1).打开浏览器进入SQLite主页, www.sqlite.org. 2).单击页 ...
- STM32F4_TIM输入波形捕获(脉冲频率)
Ⅰ.概述 本文在前面文章“STM32基本的计数原理”的基础上进行拓展,讲述关于“定时器输入捕获”的功能,和上一篇文章“定时器比较输出”区别还是挺大的.在引脚上刚好相反:一个输入.一个输出. 本文只使用 ...
- 第二十章 数据访问(In .net4.5) 之 使用LINQ
1. 概述 .net3.5中新添加给C#的LINQ查询,提供了直观便捷的数据查询方式.并且支持多种数据源的查询. 本章介绍标准的LINQ操作,如何用最优的方式使用LINQ 以及 LINQ to XML ...
- eclipse新建android项目,编译出错解决方法
1.新建android项目 2.在libs中,将android-support-v4.jar添加到生成目录 3.如果项目引用了ActionBar等,需要引用V7的话,添加外部Jar包,路径为eclip ...
- C 实现一个简易的Http服务器
引言 做一个老实人挺好的,至少还觉得自己挺老实的. 再分享一首 自己喜欢的诗人的一首 情景诗. 每个人总会有问题,至少喜欢就好, 本文 参照 http 协议 http://www.cnblogs. ...
- 软件工程个人作业4(课堂练习&&课堂作业)
题目:返回一个整数数组中最大子数组的和. 要求:1.输入一个整型数组,数组里有正书和负数. 2.数组中连续的一个或者多个整数组,每个子数组都有一个和. 3.求所有子数组的和的最大值.要求时间复杂度为0 ...
- Windows Phone中In-App Purchase应用内购买
前言 应用内购买(In-App Purchase)对于开发者来说绝对是一个非常重要的功能,它提供了一个便捷的入口供用户来购买付费.在IAP盛行之前的游戏运营商一般都是通过接入第三方支付入口 ...
- iOS开发的22个奇谲巧技
结合自身的实践开发经验总结出了22个iOS开发的小技巧,以非常欢乐的语调轻松解决开发过程中所遇到的各种苦逼难题,光读着便已忍俊不禁. 1. TableView不显示没内容的Cell怎么办? 类似于图1 ...