Optimizing `Scan` performance

Minimizing Scan Usage

performan as much of the computation as possible outside of Scan. This may have the effect increasing memory usage but also reduce the overhead introduce by Scan.

Explicitly passing inputs of the inner function to scan

It's more efficient to explicitly pass parameter as non-sequence inputs.

Examples: Gibbs Sampling

Version One:

import theano

from theano import tensor as T

W = theano.shared(W_values) # we assume that ``W_values`` contains the

                            # initial values of your weight matrix

bvis = theano.shared(bvis_values)

bhid = theano.shared(bhid_values)

trng = T.shared_randomstreams.RandomStreams(1234)

def OneStep(vsample) :

    hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)

    hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)

    vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)

    return trng.binomial(size=vsample.shape, n=1, p=vmean,

                         dtype=theano.config.floatX)

sample = theano.tensor.vector()

values, updates = theano.scan(OneStep, outputs_info=sample, n_steps=10)

gibbs10 = theano.function([sample], values[-1], updates=updates)

Version Two:

W = theano.shared(W_values) # we assume that ``W_values`` contains the

                            # initial values of your weight matrix

bvis = theano.shared(bvis_values)

bhid = theano.shared(bhid_values)

trng = T.shared_randomstreams.RandomStreams(1234)

# OneStep, with explicit use of the shared variables (W, bvis, bhid)

def OneStep(vsample, W, bvis, bhid):

    hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)

    hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)

    vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)

    return trng.binomial(size=vsample.shape, n=1, p=vmean,

                     dtype=theano.config.floatX)

sample = theano.tensor.vector()

# The new scan, with the shared variables passed as non_sequences

values, updates = theano.scan(fn=OneStep,

                              outputs_info=sample,

                              non_sequences=[W, bvis, bhid],

                              n_steps=10)

gibbs10 = theano.function([sample], values[-1], updates=updates)

Deactivating garbage collecting in Scan

Deactivating garbage collecting in Scan can allow it to reuse memory between executins instead of always having to allocate new memory. Scan reuses memory between iterations of the same execution but frees the memory after the last iteration.

config.scan.allow_gc=False

Graph Optimizations

There are patterns that Theano can't optimize. the LSTM tutorial provides an example of optimization that theano can't perform. Instead of performing many matrix multiplications between matrix \(x_t\) and each of the shared msatrices \(W_i,W_c,W_f\) and \(W_o\), the matrixes \(W_{*}\) are merged into a single shared \(W\) and the graph performans a single larger matrix multiplication between \(W\) and \(x_t\). The resulting matrix is then sliced to obtain the results of that the small individial matrix multiplications by a single larger one and thus improves performance at the cost of a potentially higher memory usage.

theano scan optimization的更多相关文章

theano中的scan用法
scan函数是theano中的循环函数,相当于for loop.在读别人的代码时第一次看到,有点迷糊,不知道输入.输出怎么定义,网上也很少有example,大多数都是相互转载同一篇.所以,还是要看官方 ...
Theano学习-scan循环
\(1.Scan\) 通用的一般形式,可用于循环减少和映射(对维数循环)是特殊的 \(scan\) 对输入序列进行 \(scan\) 操作,每一步都能得到一个输出 \(scan\) 能看到定义函数的 ...
theano学习
import numpy import theano.tensor as T from theano import function x = T.dscalar('x') y = T.dscalar( ...
LSTM 分类器笔记及Theano实现
相关讨论 http://tieba.baidu.com/p/3960350008 基于教程http://deeplearning.net/tutorial/lstm.html LSTM基本原理http ...
关于thenao.scan() fn函数参数的说明
theano.scan()原型: theano.scan( fn, sequences=None, outputs_info=None, non_sequences=None, n_steps=Non ...
Theano学习-梯度计算
1. 计算梯度创建一个函数 \(y\) ,并且计算关于其参数 \(x\) 的微分. 为了实现这一功能,将使用函数 \(T.grad\) . 例如:计算 \(x^2\) 关于参数 \(x\) 的梯度. ...
IMPLEMENTING A GRU/LSTM RNN WITH PYTHON AND THEANO - 学习笔记
catalogue . 引言 . LSTM NETWORKS . LSTM 的变体 . GRUs (Gated Recurrent Units) . IMPLEMENTATION GRUs 0. 引言 ...
theano安装问题
WARNING (theano.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain` ...
theano使用
一 theano内置数据类型只有thenao.shared()类型才有get_value()成员函数(返回numpy.ndarray)? 1. 惯常处理 x = T.matrix('x') # t ...

随机推荐

SalesForce 记录级别安全性
对象级安全性简档对象级安全性提供了控制 Salesforce.com 中数据的最简单方式.使用对象级安全性您可以防止用户查看.创建.编辑或删除特殊类型对象的任何实例如潜在客户或业务机会.对象 ...
【代码笔记】iOS-页面调的时候隐藏工具条
代码: - (void)viewDidLoad { [super viewDidLoad]; // Do any additional setup after loading the view. se ...
python基础（1）变量类型
变量赋值: python中的变量不需要类型声明每个变量在使用前必须赋值,变量赋值以后才会被创建变量在内存中创建时,包括变量的标识.名称和数据这些信息. EX: #!/usr/bin/python ...
读《C#高级编程》第1章问题
读<C#高级编程>第1章 .Net机构体系笔记网红的话:爸爸说我将来会是一个牛逼的程序员,因为我有一个梦,虽然脑壳笨但是做事情很能坚持. 本章主要是了解.Net的结构,都是一些概念,并没 ...
getopt,getoptlong学习
getopt和getoptlong被用来解析命令行参数. 一.getopt #include <unistd.h> extern char *optarg; extern i ...
Jsoup系列学习(1)-发送get或post请求
简介 jsoup 是一款Java 的HTML解析器,可直接解析某个URL地址.HTML文本内容.它提供了一套非常省力的API,可通过DOM,CSS以及类似于jQuery的操作方法来取出和操作数据. 官 ...
MMORPG大型游戏设计与开发（服务器 AI 控制器）
上一篇我们说了基础接口的组成,想必大家对AI中的基础方法有了一定的了解,而基础接口只能一个通用的,要实现不同的类别还需子类中实现,这就形成了玩家.主动.被动.木桩这些类型.不同类型的AI需要有一个统一 ...
transactionManager 以及datasource type解析
transactionManager 在 MyBatis 中有两种事务管理器类型(也就是 type=”[JDBC|MANAGED]”): JDBC – 这个配置直接简单使用了 JDBC 的提交和回滚设 ...
基于pcDuino-V2的无线视频智能小车
这段时间抽空做了个智能视频小车.包含了pid电机控制.socket网络编程.多线程编程.epoll机制.gtk图形界面编程. 这是界面: 小车的底层是用的stm32f405系列的单片机+电机驱动做的一 ...
Django之Model操作
Django之Model操作本节内容字段字段参数元信息多表关系及参数 ORM操作 1. 字段字段列表 AutoField(Field) - int自增列,必须填入参数 primary_ke ...

theano scan optimization

Optimizing Scan performance