How Does #DeepDream Work?
How Does #DeepDream Work?
Do neural networks hallucinate of electronic dogs?
If you’ve been browsing the net recently, you might have stumbled on some strange-looking images, with pieces of dog heads, eyes, legs and what looks like buildings, sometimes superimposed on a normal picture, sometimes not. Although they can be nightmare-inducing (or because of that), they have gained a lot of popularity on the Internet. Often tagged #deepdream, they are made by a neural network trained on a huge set of categorized images and set free to generate new ones. The network comes from Google Research and its code is currently available on github, spawning more home-made neural image generators.
It turns out, like many things on the Internet, it has something to do with cats.
In 1971, a British scientist named Sir Colin Blakemore raised a kitten in complete darkness, except for several hours a day in a small cage, where the kitten could only see black and white horizontal stripes. A month or so later, the kitten was introduced to the normal world. It reacted to light, but otherwise seemed to be blind. It didn’t follow moving objects, unless they made a sound. When a recording was taken from the kitten’s visual cortex, it turned out that its neurons reacted to horizontal lines, but not vertical, so the brain was unable to comprehend the complexity of the real world. Another kitten raised in a vertical stripe environment had a similar disability.
The research method is controversial and Sir Colin had his share of threats from angry animal defenders, but the result is interesting. The experiment tells us that the vision system, at least in cats, is something that develops after birth. The visual cortex of a kitten adapts to what the newborn eyes are exposed to and forms neurons that react to dominant basic patterns, like vertical or horizontal lines. From those basic patterns it can then make sense of a more complex image, infer depth and motion. This gives us hope and suggests a method for creating an artificial vision system.
Convolutional neural networks
A feed-forward artificial neural network, when used as a classifier, takes an array of values as input and tries to assign it to a category in response. A relatively small network can be trained to guess a person’s gender, given numeric height and hair length. We collect some sample data, present the samples to the network and gradually modify the network’s weights to minimize the error. The network should then find the general rule and give correct answers to samples it has never seen before. If well trained, it will be correct most of the time and wrong in cases where people would probably be wrong too, given only the same information: height and hair length, and would be much faster too. People are not good with numbers. In turn, estimating gender from a photo is a much easier task for humans, but orders of magnitude harder for computers. An image, represented as a set of numerical values of pixels, is a huge input space, impossible to process in reasonable time. And what if we don’t want to detect genders, but dog breeds, or recognize plants, or find cancer in x-rays? This is where convolutional neural networks come in.
Convolutional neural networks are a breed of neural networks introduced by Kunihiko Fukushima back in 1980, under the name “Neocognitron”. You may stumble across the name “LeNet”, named after Yann LeCun, a researcher working with Facebook.
A kernel convolution is an image operation that, for each pixel, takes the pixels in its square neighborhood, calculates a weighted average of their values, and puts the result as a new value of the pixel. An equally-weighted average will produce a blurry image. Negative weights for nearest neighbors will make the image look sharper. With properly selected weights, we can enhance vertical or horizontal lines, or lines of any angle. With a bigger convolution kernel (neighborhood), we can find curved lines, color gradients or simple patterns. This technique has been known and used in image analysis and manipulation for years. In convolutional neural networks though, the weights in the convolution matrices can be trained using error back-propagation. This way, instead of each pixel being an input, we can have one neuron reacting only to vertical lines, another to horizontal lines, or angles, just like in a cat’s visual cortex.
A layer of a convolutional neural network, consists of a number of such image-transforming neurons, each emphasizing a different aspect of the image. An image then becomes split to a number of feature channels: instead of the initial RGB, we get one channel per kernel. Hardly a solution to the input size problem. We use pooling layers for that. A pooling layer takes non-overlapping square neighborhoods of pixels, finds the highest value and returns that as the value of the neighborhood. Note that, if we did that on the original image, we’d just get a badly pixelated, somewhat brighter miniature. When a pooling layer’s input comes from a convolutional layer, its response means “There is this feature in this area”, which is actually useful information. Pooling layers also make the network less sensitive to where the features are in the image.

“Park or Bird” problem by XKCD
Still, knowing that there are vertical or horizontal lines, gradients or edges, doesn’t help us detect if the photo contains a bird or not (see famous park or bird problemfinally solved by Flickr). Well, we made a step in the right direction, so why not make the next step and stack another set of convolutional neurons, and a pooling layer on top of that, creating a deep learning neural network. Turns out, with enough layers, a network “gets” quite complex features. A face is, after all, a combination of eyes, nose and mouth, with a chance of ears and hair. We can then use these complex features as input for a regular feed-forward neural network and train it to return a category: bird, dog, building, electric guitar, school bus or pagoda.
In 2012, a convolutional neural network was trained on Youtube videos and allowed to freely self-organize categories for what it could see. It self organized a category of cats (see, all goes back to cats again). In 2014, convolutional networks, working together with recurrent neural networks trained on full sentences, learned to describe images in full sentences. Recurrent neural networks are another topic.

Source: Deep Fragment Embeddings for Bidirectional Image-Sentence Mapping by Andrey Karpathy
Dreaming deep
Since 2012, deep learning neural networks started winning image analysis competitions, reaching near-human accuracy in labeling images. Initially, there was some resistance from the computer vision community. One complaint is that the networks are winning, but, well, not showing their work. The leading methods at the time, when detecting a face, for example, would give exact positions of eyes and mouth and return various proportions of the purported face. A deep neural network would just detect faces with uncanny accuracy and not tell us how. While the inner workings of the first convolutional layer were well understood (lines, edges, gradients), it was impossible to look into how the second layer used the information given by the first. The layers above that were a mystery.
Researchers have been successful in fooling these deep neural networks, by generating images with an evolutionary algorithm. Make a random noise image, check the network’s response. If the network thinks there’s a chance of a bus in the image, generate more images like that (offspring) feed those to the network again. The winner is the one that increases the network’s certainty, and gets to contribute to the next generation. What looks like random noise to us, will be interpreted as a bus by the network. If we also optimize the generated images to have statistical properties of real samples, the noise will turn into shapes and textures that we can identify. Such generated images tell us that, for example, stripes are a key feature of bees, that bananas are usually yellow and anemone fish have orange-white-black stripes.
Another approach is feed a real image to the network and then pick a layer, apply the winning transformation to each part of the image, and feed the result back to the network. For the first layer, expectedly, the network will enhance leading line directions in the image, giving it an impressionistic style of wavy brush strokes, dots or swirls. If we do that with a higher layer – reversing the process through lower layers, we get an image painted with textures, like fur, wood, feathers, scales, bricks, grass, waves, spaghetti or meatballs. A yet higher layers will turn any vertical lines into legs and draw eyes and noses on shapes that vaguely look like faces (by the way, people do that too, it’s called pareidolia). Since there are a lot of animal photos in the training image set, the generated images often contain parts of images, sometimes gruesomely distorted. Animal faces are superimposed on human faces, spiky leaves become bird beaks and everything seems a bit hairy. Our brains struggle to make sense of the network’s dreams. The images are disturbing because they remind us of something, but at the same time – not exactly. This makes them interesting, maybe disturbing, and viral on the internet.
What does that say about the history of art, if a lower layer paints a Van Gogh or Seurat picture, while a higher level layer reminds us of Picasso or Dali?

Left: Original photo by Zachi Evenor. Right: processed by Günther Noack, Software Engineer
Going deeper
Things get really interesting if we directly suggest to the network what it should see, by triggering the last, classification layer. Now, I will leave the explanation of how that trigger is passed back through the network to those who have done it. Instead of starting with a random image, it helps to take a real image, blur it a bit, zoom in, and let the network “enhance” it by drawing what it sees. When we repeat the process, we get a potentially infinite zooming sequence of bits and pieces of the network’s sample data.
This is how the Large Scale Deep Neural Network (LSD-NN) is able to hallucinate like that in real-time. It was made by Jonas Degrave and team, known as 317070 on github. Interestingly, this one was built before Google published the deep dream code. 317070’s network hallucinates in a twitch stream, where users can shout categories from Image Net databaseand the network will do its best to produce images that remind it of what the user suggested. The network doesn’t exactly draw the requested objects, but gets the essence. When users ask for volcanoes, there’s smoke and lava. When users shout for pizza, there’s melting cheese. You can see sausages in a butcher shop and spiders on spiderwebs, but most of the time it’s a mesmerizing, colorful soup of textures and shapes. Really. Try it.
It may be called fun or nightmarish, but we learn a lot from making networks dream. We have found the equivalent of the first convolutional layer in cat brains. We have a model which confirms the theory that dreaming helps us remember (networks can be trained on their own generated sample data). We basically simulated pareidolia, perhaps we can infer some information about the mechanism behind schizophrenia.
Like chess and, recently, Jeopardy, machines have crossed yet another threshold and took over something we used to be better at. Remember when captchas were a good way to stop crawlers and bots from stealing your online data? Not anymore. Now it’s just a challenge of the bot’s computational ability. The viral images under the #deepdream hashtag are just a sign that machine vision is becoming mainstream and it’s time to accept that dreaming of electronic sheep (or dogs) is just something androids do.
How Does #DeepDream Work?的更多相关文章
- 4.keras实现-->生成式深度学习之DeepDream
DeepDream是一种艺术性的图像修改技术,它用到了卷积神经网络学到的表示,DeepDream由Google于2015年发布.这个算法与卷积神经网络过滤器可视化技术几乎相同,都是反向运行一个卷积神经 ...
- cs231n---卷积网络可视化,deepdream和风格迁移
本课介绍了近年来人们对理解卷积网络这个“黑盒子”所做的一些可视化工作,以及deepdream和风格迁移. 1 卷积网络可视化 1.1 可视化第一层的滤波器 我们把卷积网络的第一层滤波器权重进行可视化( ...
- 单图像三维重建、2D到3D风格迁移和3D DeepDream
作者:Longway Date:2020-04-25 来源:单图像三维重建.2D到3D风格迁移和3D DeepDream 项目网址:http://hiroharu-kato.com/projects_ ...
- TensorFlow创建DeepDream网络
TensorFlow创建DeepDream网络 Google 于 2014 年在 ImageNet 大型视觉识别竞赛(ILSVRC)训练了一个神经网络,并于 2015 年 7 月开放源代码. 该网络学 ...
- [深度学习大讲堂]从NNVM看2016年深度学习框架发展趋势
本文为微信公众号[深度学习大讲堂]特约稿,转载请注明出处 虚拟框架杀入 从发现问题到解决问题 半年前的这时候,暑假,我在SIAT MMLAB实习. 看着同事一会儿跑Torch,一会儿跑MXNet,一会 ...
- GitHub 上 57 款最流行的开源深度学习项目
转载:https://www.oschina.net/news/79500/57-most-popular-deep-learning-project-at-github GitHub 上 57 款最 ...
- (转) Awesome Deep Learning
Awesome Deep Learning Table of Contents Free Online Books Courses Videos and Lectures Papers Tutori ...
- 除Hadoop大数据技术外,还需了解的九大技术
除Hadoop外的9个大数据技术: 1.Apache Flink 2.Apache Samza 3.Google Cloud Data Flow 4.StreamSets 5.Tensor Flow ...
- 30个深度学习库:按Python、C++、Java、JavaScript、R等10种语言分类
30个深度学习库:按Python.C++.Java.JavaScript.R等10种语言分类 包括 Python.C++.Java.JavaScript.R.Haskell等在内的一系列编程语言的深度 ...
随机推荐
- 关于SWT中的Combo类和List类
Combo类的谱系图: 一个Combo类的实例: Combo1.java public class Combo1 { public static void main(String[] args) { ...
- Java读取一个文件并打印到控制台上
package test9; import java.io.BufferedReader; import java.io.File; import java.io.FileNotFoundExcept ...
- php中的匿名函数(Anonymous functions)和闭包函数(closures)
一:匿名函数 (在php5.3.0 或以上才能使用) php中的匿名函数(Anonymous functions), 也叫闭包函数(closures), 允许指定一个没有名称的函数.最常用的就是回调函 ...
- UML 结构图之类图 总结
[注] 本文不是类图的基础教程, 只是类图的图形总结. 学习UML图形 推荐阅读<UML参考手册>第2版. http://www.umlchina.com/ 推荐微软的开发软件设计模型 h ...
- cisco通过控制口或者通过远程配置交换机
学而不思则罔,思而不学则殆,每天坚持一小步,则成功一大步 下面我们通过Cisco Packet来模拟交换机和路由器的远程和控制台登录配置交换机. 交换机console口的连接与配置方法 (1),在Ci ...
- java中IO流的操作
读取转换流--读取键盘录入中键盘录入一行数据并打印其大写,发现就是读一行数据的原理.也就是readLine方法.能不能直接使用readLine方法来完成键盘录入一行数据的读取呢?readLine方法是 ...
- Java实战之04JavaWeb-05事务和连接池
一.事务部分 1.事务的简介 做一件事情,这个一件事情中有多个组成单元,这个多个组成单元要不同时成功,要不同时失败.A账户转给B账户钱,将A账户转出钱的操作与B账户转入钱的操作绑定到一个事务中,要不这 ...
- [cocos2d-x 2.0.4][iOS7]图片加载错误
本篇文章由:http://www.sollyu.com/cocos2d-x-2-0-4-ios7-image-loading-errors/ 说明 错误提示 <Error>: CGBitm ...
- 使用Class.getResource和ClassLoader.getResource方法获取文件路径
自从转投Java阵营后,一直发下Java程序的路径读取异常麻烦,因此查阅了比较多的版本内容,整合了一份自己的学习笔记.主要使用Class及通过ClassLoader来动态获取文件路径. 查阅链接如下: ...
- Unable to make the session state request to the session state server处理方法
Server Error in '/' Application. Unable to make the session state request to the session state serve ...