FastML

Machine learning made easy

Deep learning architecture diagrams

2016-09-30

Like a wild stream after a wet season in African savanna diverges into many smaller streams forming lakes and puddles, deep learning has diverged into a myriad of specialized architectures. Each architecture has a diagram. Here are some of them.

Neural networks are conceptually simple, and that’s their beauty. A bunch of homogenous, uniform units, arranged in layers, weighted connections between them, and that’s all. At least in theory. Practice turned out to be a bit different. Instead of feature engineering, we now have architecture engineering, as described by Stephen Merrity:

The romanticized description of deep learning usually promises that the days of hand crafted feature engineering are gone - that the models are advanced enough to work this out themselves. Like most advertising, this is simultaneously true and misleading.

Whilst deep learning has simplified feature engineering in many cases, it certainly hasn’t removed it. As feature engineering has decreased, the architectures of the machine learning models themselves have become increasingly more complex. Most of the time, these model architectures are as specific to a given task as feature engineering used to be.

To clarify, this is still an important step. Architecture engineering is more general than feature engineering and provides many new opportunities. Having said that, however, we shouldn’t be oblivious to the fact that where we are is still far from where we intended to be.

Not quite as bad as doings of architecture astronauts, but not too good either.

An example of architecture specific to a given task

LSTM diagrams

How to explain those architectures? Naturally, with a diagram. A diagram will make it all crystal clear.

Let’s first inspect the two most popular types of networks these days, CNN and LSTM. You’ve already seen a convnet diagram, so turning to the iconic LSTM:

It’s easy, just take a closer look:

As they say, in mathematics you don’t understand things, you just get used to them.

Fortunately, there are good explanations, for example Understanding LSTM Networks and Written Memories: Understanding, Deriving and Extending the LSTM.

LSTM still too complex? Let’s try a simplified version, GRU (Gated Recurrent Unit). Trivial, really.

Especially this one, called minimal GRU.

More diagrams

Various modifications of LSTM are now common. Here’s one, called deep bidirectional LSTM:

DB-LSTM, PDF

The rest are pretty self-explanatory, too. Let’s start with a combination of CNN and LSTM, since you have both under your belt now:

Convolutional Residual Memory Network, 1606.05262

Dynamic NTM, 1607.00036

Evolvable Neural Turing Machines, PDF

Recurrent Model Of Visual Attention, 1406.6247

Unsupervised Domain Adaptation By Backpropagation, 1409.7495

Deeply Recursive CNN For Image Super-Resolution, 1511.04491

This diagram of multilayer perceptron with synthetic gradients scores high on clarity:

MLP with synthetic gradients, 1608.05343

Every day brings more. Here’s a fresh one, again from Google:

Google’s Neural Machine Translation System, 1609.08144

And Now for Something Completely Different

Drawings from the Neural Network ZOO are pleasantly simple, but, unfortunately, serve mostly as eye candy. For example:

ESM, ESN and ELM

These look like not-fully-connected perceptrons, but are supposed to represent a Liquid State Machine, an Echo State Network, and an Extreme Learning Machine.

How does LSM differ from ESN? That’s easy, it has green neuron with triangles. But how does ESN differ from ELM? Both have blue neurons.

Seriously, while similar, ESN is a recursive network and ELM is not. And this kind of thing should probably be visible in an architecture diagram.

Posted by Zygmunt Z. 2016-09-30 basics, neural-networks

« Factorized convolutional neural networks, AKA separable convolutions

Comments

Twitter

Follow @fastml for notifications about new posts.

Status updating...

Follow @fastml

Also check out @fastml_extra for things related to machine learning and data science in general.

GitHub

Most articles come with some code. We push it to Github.

https://github.com/zygmuntz

Cubert

Visualize your data in interactive 3D, as described here.

http://cubert.fastml.com/

(转) Deep learning architecture diagrams的更多相关文章

15 cvpr An Improved Deep Learning Architecture for Person Re-Identification
http://www.umiacs.umd.edu/~ejaz/ * 也是同时学习feature和metric * 输入一对图片,输出是否是同一个人 * 包含了一个新的层: include a lay ...
Deep Learning in a Nutshell: History and Training
Deep Learning in a Nutshell: History and Training This series of blog posts aims to provide an intui ...
深度学习材料：从感知机到深度网络A Deep Learning Tutorial: From Perceptrons to Deep Networks
In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyo ...
【Deep Learning】genCNN: A Convolutional Architecture for Word Sequence Prediction
作者:Mingxuan Wang.李航,刘群单位:华为.中科院时间:2015 发表于:acl 2015 文章下载:http://pan.baidu.com/s/1bnBBVuJ 主要内容: 用de ...
Why GEMM is at the heart of deep learning
Why GEMM is at the heart of deep learning I spend most of my time worrying about how to make deep le ...
【深度学习Deep Learning】资料大全
最近在学深度学习相关的东西,在网上搜集到了一些不错的资料,现在汇总一下: Free Online Books by Yoshua Bengio, Ian Goodfellow and Aaron C ...
(转) Awesome - Most Cited Deep Learning Papers
转自:https://github.com/terryum/awesome-deep-learning-papers Awesome - Most Cited Deep Learning Papers ...
(转) Deep Learning Research Review Week 2: Reinforcement Learning
Deep Learning Research Review Week 2: Reinforcement Learning 转载自: https://adeshpande3.github.io/ad ...
deep learning 的综述
从13年11月初开始接触DL,奈何boss忙or 各种问题,对DL理解没有CSDN大神比如 zouxy09等深刻,主要是自己觉得没啥进展,感觉荒废时日(丢脸啊,这么久....)开始开文,即为记录自 ...

随机推荐

Filestream复制视频文件
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.I ...
c#简易反射调用泛型方法
// 所谓程序集的简单理解,存在不同项目中(不是解决方案),即using前需要引用**.dll 1.调用当前类文件下的方法public List<T> GetByCondition< ...
Java Script
一.JavaScript简介 1.JavaScript是个什么东西? 它是个脚本语言,需要有宿主文件,它的宿主文件是HTML文件. 2.它与Java什么关系? 没有什么直接的联系,Java是Sun公司 ...
向量时钟算法简介——本质类似MVCC
转自:http://blog.chinaunix.net/uid-27105712-id-5612512.html 一.使用背景先说一下需要用到向量时钟的场景.我们在写数据时候,经常希望数据不要存储 ...
guava学习--hashing
128位的MurmurHash(烽火使用过): 看一下Java标准库中的非加密哈希算法你会发现少了MurmurHash,这是一个简单高效且还是分布式的算法,在许多语言中都有着很好的支持.我们并不是说要 ...
zookeeper学习系列：二、api实践
上一章我们知道zookeeper的简介,启动,设置节点以及结构性能.本小节我们来玩玩api,获取下数据. php版本: http://anykoro.sinaapp.com/2013/04/05/%E ...
iOS开发多线程篇—NSOperation基本操作
iOS开发多线程篇—NSOperation基本操作一.并发数 (1)并发数:同时执⾏行的任务数.比如,同时开3个线程执行3个任务,并发数就是3 (2)最大并发数:同一时间最多只能执行的任务的个数. ...
Cisco防火墙配置
帮朋友调试一台ASA,做一下记录(很久没动手了) 三个区: vlan10: inside: 192.168.1.1 vlan20: outside: 202.102.1.1 vlan30: dmz: ...
Python、PIP环境变量的配置
Python安装的路径:D:\Python35 pip的环境变量 Python和pip的PATH: PIP下载链接:https://pypi.python.org/pypi/pip 随意解压好,然后C ...
MVC模型
MVC:model.view.controller. 浏览器browser发出一个请求,被servlet(控制器controller)接收,由servlet去实例化一个模型层(JavaBean)的对象 ...

(转) Deep learning architecture diagrams