theano使用

一 theano内置数据类型

只有thenao.shared()类型才有get_value()成员函数（返回numpy.ndarray）？

1. 惯常处理

x = T.matrix('x')  # the data is presented as rasterized images

y = T.ivector('y') # the labels are presented as 1D vector of [int] labels

# reshape matrix of rasterized images of shape

# (batch_size, 28*28) to a 4D tensor, 使其与LeNetConvPoolLayer相兼容

layer0_input = x.reshape((batch_size, 1, 28, 28))

>>> x.reshape((500, 3, 28, 28))

TensorType(float64, 4D)

>>> x.type

TensorType(float64, matrix)

>>> layer0_input.type

TensorType(float64, (False, True, False, False))

            # 布尔值表示是否可被broadcast

>>> x.reshape((500, 3, 28, 28)).type

TensorType(float64, 4D)

>>> T.dtensor4().type

TensorType(float64, 4D)

2. theano.shared 向 numpy.ndarray 的转换

# train_set_x: theano.shared()类型

train_set_x.get_value(borrow=True)

        # 返回的正是ndarray类型，borrow=True表示返回的是“引用”

train_set_x.get_value(borrow=True).shape[0]

3. built-in data types

查阅theano完备的文档，我们知：

theano所内置的数据类型主要位于theano.tensor子模块下，

import theano.tensor as T

以b开头，表示byte类型（bscalar, bvector, bmatrix, brow, bcol, btensor3,btensor4）
以w开头，表示16-bit integers（wchar）（wscalar, wvector, wmatrix, wrow, wcol, wtensor3, wtensor4）
以i开头，表示32-bit integers（int）（iscalar, ivector, imatrix, irow, icol, itensor3, itensor4）
以l开头，表示64-bit integers（long）（lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4）
以f开头，表示float类型（fscalar, fvector, fmatrix, fcol, frow, ftensor3, ftensor4）
以d开头，表示double类型（dscalar, dvector, dmatrix, dcol, drow, dtensor3, dtensor4）
以c开头，表示complex类型（cscalar, cvector, cmatrix, ccol, crow, ctensor3, ctensor4）

这里的tensor3/4类型也不神秘，

scalar：0-dim ndarray
vector：1-dim ndarray
matrix：2-dim ndarray
tensor3：3-dim ndarray
tensor4：4-dim ndarray

注意以上这些类型的类型都是theano.tensor.var.TensorVariable

>>> x = T.iscalar('x')

>>> type(x)

theano.tensor.var.TensorVariable

>>> x.type

TensorType(int32, scalar)

我们继续考察tensor：

>>> x = T.dmatrix()

>>> x.type

>>> x = T.matrix()

>>> x.type

在设计经典的卷积神经网络（CNN）时，在输入层和第一个隐层之间需要加一个卷积的动作，对应的api是theano.tensor.signal.conv.conv2d，其主要接受两个符号型输入symbolic inputs：

一个4d的tensor对应于mini_batch的输入图像：

mini_batch_size
# of feature input maps
image height
image width

一个4d的tensor对应于权值矩阵W

# of feature output maps（也即 # of filters）
# of feature input maps
filter height
filter width

rng = np.random.RandomState(23455)

input = T.dtensor4('input')

w_shp = (2, 3, 9, 9)

            # 3 means: rgb, 图像的三种颜色分量

w_bound = np.sqrt(np.prod(w_shp[1:]))

W = theano.shared(np.asarray(rng.uniform(low=-1./w_bound, high=1./w_bound, size=w_shp), dtype=input.dtype), name='W')

conv_out = conv.conv2d(input, W)

import numpy

import theano.tensor as T

from theano import function

x = T.dscalar('x')

y = T.dscalar('y')

z = x + y

f = function([x, y], z)

numpy.allclose(f(16.3, 12.1), 28.4)     输出为true

numpy.allclose(z.eval({x:16.3, y:12.1}, 28.4))    输出为true

二 theano学习

tensor:高维数组，T 里面其实有scalar （一个数据点），vector (向量），matrix (矩阵），tensor3 (三维矩阵)，tensor4 （四位矩阵）这些都落入tensor的范畴。

dscalar:不是一个类，是一个TensorVariable实例。特别的，T.dscalar指：doubles(d)型的0维arrays(scalar)。

pp:一个函数，from theano import pp print(pp(z)) 则pretty-print 关于z的计算：输出（x+y）.

以下为具体类型（theano 0.8.2）：

import theano

a = theano.tensor.vector()   # 引入tensor中的vector型

out = a + a**10

f = theano.function([a], out)

print(f([0,1,2]))            # 输出[0.   2. 1026.]

logistics代码：

import theano

import theano.tensor as T

x = T.dmatrix('x')

s = 1/(1 + T.exp(-x))

logistic = theano.function([x], s)

logistic([[0, 1],[-1, -2]])       # 输出array([[0.5         ,0.73105858],

                                               [0.26894142 , 0.11920292]])

一次计算多项：

>>> a, b = T.dmatrices('a', 'b')             # dmatrices 提供多个输出，这是声明多变量的一个捷径

>>> diff = a - b

>>> abs_diff = abs(diff)

>>> diff_squared = diff**2

>>> f = theano.function([a, b], [diff, abs_diff, diff_squared])

  >>> f([[1, 1], [1, 1]], [[0, 1], [2, 3]])

     [array([[ 1., 0.],

            [-1., -2.]]), array([[ 1., 0.],

            [ 1., 2.]]), array([[ 1., 0.],

            [ 1., 4.]])]

为参数设定默认值，引入function中的参数In

>>> from theano import In

>>> from theano import function

>>> x, y = T.dscalars('x', 'y')

>>> z = x + y

>>> f = function([x, In(y, value=1)], z)          # 引入类In：允许你为函数参数进行更多细节上的特定化

>>> f(33)

array(34.0)

>>> f(33, 2)

array(35.0)

>>> x, y, w = T.dscalars('x', 'y', 'w')

>>> z = (x + y) * w

>>> f = function([x, In(y, value=1), In(w, value=2, name='w_by_name')], z)        # 注意这里引入name

>>> f(33)

array(68.0)

>>> f(33, 2)

array(70.0)

>>> f(33, 0, 1)

array(33.0)

>>> f(33, w_by_name=1)

array(34.0)

>>> f(33, w_by_name=1, y=0)

array(33.0)

利用共享变量（Shared Variables）

例如我们想造一个累加器，开始初始化为0，随着函数每被调用一次，累加器通过函数声明进行叠加。shared函数构造了一个称为 shared vairables的结构，其值被很多函数共享，其值可以通过调用.get_value()来access,通过.set_value()来modified.

另一个说明：在function中引入参数updates .function.updates必须以pairs（shared-variable, new expression）的列表形式提供，当然形式也可以是字典（其键为shared-variables，值为new expression）。顾名思义，update就是用后面的值代替前面的值。

代码：

>>> from theano import shared

>>> state = shared(0)

>>> inc = T.iscalar('inc')

>>> accumulator = function([inc], state, updates=[(state, state+inc)])

>>> print(state.get_value())

0

>>> accumulator(1)

array(0)

>>> print(state.get_value())

1

>>> accumulator(300)

array(1)

>>> print(state.get_value())

301

>>> state.set_value(-1)

>>> accumulator(3)

array(-1)

>>> print(state.get_value())

2                                            # 此时共享变量值为2，注意下文

  >>> decrementor = function([inc], state, updates=[(state, state-inc)])          # 定义另一个函数来共享shared variable

  >>> decrementor(2)                                                              # 给inc赋值为2

  array(2)                                                                        # 此时输出共享变量值还为2，注意上文

  >>> print(state.get_value())                                                    # update 将state更新为0

  0

利用function中参数givens

givens参数被用来替代任何符号变量，不仅仅是共享变量，你可以用来替代常量，表达式。注意不要引入一个互相依赖的替代品，因为替代者的顺序没有定义，所以他们会以任意顺序工作。实际中，可以将givens看作一种机制：允许你用不同的表示方法（evaluates to a tensor of same shape and dtype，相同的尺寸和类型）替代你的任何公式。

>>> fn_of_state = state * 2 + inc

>>> # The type of foo must match the shared variable we are replacing

>>> # with the ``givens``

>>> foo = T.scalar(dtype=state.dtype)                                                 # 因为下文要用foo代替state，所以要获得相同类型

>>> skip_shared = function([inc, foo], fn_of_state, givens=[(state, foo)])            # 这里用foo代替state！

>>> skip_shared(1, 3) # we're using 3 for the state, not state.value                  # 这里的1 赋值给了inc， 3赋值给了foo， 在计算中，用foo代替了state

array(7)                                                                              # state *2+inc变为 foo *2+inc ，所以为7

>>> print(state.get_value()) # old state still there, but we didn't use it            # state 值没变，所以仍然为0

0

copy 函数

> import theano

>>> import theano.tensor as T

>>> state = theano.shared(0)

>>> inc = T.iscalar('inc')

>>> accumulator = theano.function([inc], state, updates=[(state, state+inc)],on_unused_input='ignore')

>>> accumulator(10)

array(0)

>>> print(state.get_value())

10

>>> new_state = theano.shared(0)

>>> new_accumulator = accumulator.copy(swap={state:new_state})               # 利用swap参数将new_state替代原accumulate中的state

>>> new_accumulator(100)

[array(0)]

>>> print(new_state.get_value())

100

>>> print(state.get_value())                                                 # 原函数中的state值未变

10

>>> null_accumulator = accumulator.copy(delete_updates=True)                 # 再定义一个新的accumulator函数，新函数移除掉了update

  >>> null_accumulator(9000)

  [array(10)]

  >>> print(state.get_value())                                                 # 这个新函数没有了uodates功能，同时也不再使用参数 inc

  10                                                                           # 如果没有移除updates，则值应该为9010。移除后，只剩state的值

随机数 Random Numbers

from theano.tensor.shared_randomstreams import RandomStreams

from theano import function

srng = RandomStreams(seed=234)

rv_u = srng.uniform((2,2))                        # 服从联合分布（uniform distribution）的2*2的随机矩阵

rv_n = srng.normal((2,2))                         # 服从正态分布（normal distribution）的2*2的随机矩阵

f = function([], rv_u)

g = function([], rv_n, no_default_updates=True) #Not updating rv_n.rng   #不再更新rv_n，即不管调用几次，这个值不变

nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)  # remark：一个随机变量在简单函数里只生成一次，所以这个函数值虽然有三次rv_u，但是函数值应该为零！

  >>> f_val0 = f()

  >>> f_val1 = f() #different numbers from f_val0      # 两次调用，两种不同结果 

  >>> g_val0 = g() # different numbers from f_val0 and f_val1

  >>> g_val1 = g() # same numbers as g_val0!           # 两次调用，两种相同结果

补充：随机抽样（numpy.random）

rand(d0,d1,...,dn) >>>np.random.rand(a,b) a*b矩阵随机值

randn(d0,d1,...,dn) >>>np.random.randn() 返回一个标准正态分布的样本

randint(low[,high,size]) >>>np.random.randint(2, size=10) 1*10维整型数组，最大值小于2 开区间

>>>np.random.randint(size=10, low=0, high=3) 1*10维整型数组，最低可取0，最大不可取3

random_integers(low[,high,size]) >>>np.random.random_integers(5, size=(3.,2.)) 用法同randint， 闭区间

random_sample([size])、random([size])、ranf([size])、sample([size]) 返回半开区间 [0.0， 1.0) 的随机浮点数

choice(a[,size,replace,p]) >>>np.random.choice(5,3) 最大为4，数目为3的一个随机数组

>>>np.random.choice(5,3,p=[0.1, 0, 0.3, 0.6, 0]) Generate a non-uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3, replace=False) array([3,1,0])

Generate a uniform random sample from np.arange(5) of size 3 without replacement

>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0]) array([2, 3, 0])

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement

bytes: 返回随机字节 >>> np.random.bytes(10) ‘ eh\x85\x022SZ\xbf\xa4‘ #random

关于排列：

shuffle(x): 现场修改序列，改变自身内容。（类似洗牌，打乱顺序）

>>> arr = np.arange(10)

>>> np.random.shuffle(arr)

>>> arr

[1 7 5 2 9 4 3 6 0 8]

This function only shuffles the array along the first index of a multi-dimensional array:

>>> arr = np.arange(9).reshape((3, 3))

>>> np.random.shuffle(arr)

>>> arr

array([[3, 4, 5],

       [6, 7, 8],

       [0, 1, 2]])

permutation(x):返回一个随机排列

>>> np.random.permutation(10)

array([1, 7, 4, 3, 0, 9, 2, 5, 8, 6])

>>> np.random.permutation([1, 4, 9, 12, 15])

array([15,  1,  9,  4, 12])

>>> arr = np.arange(9).reshape((3, 3))

>>> np.random.permutation(arr)

array([[6, 7, 8],

       [0, 1, 2],

       [3, 4, 5]])

有了以上知识，理解theano 0.8.2中关于logistics的经典例子不成问题：

import numpy

import theano

import theano.tensor as T

rng = numpy.random

N = 400     # training sample size

feats = 784 # number of input variables

# generate a dataset: D = (input_values, target_class)

D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))

training_steps = 10000

# Declare Theano symbolic variables

x = T.dmatrix("x")

y = T.dvector("y")

# initialize the weight vector w randomly

# this and the following bias variable b

# are shared so they keep their values

# between training iterations (updates)

w = theano.shared(rng.randn(feats), name="w")

# initialize the bias term

b = theano.shared(0., name="b")

print("Initial model:")

print(w.get_value())

print(b.get_value())

# Construct Theano expression graph

p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b)) # Probability that target = 1

prediction = p_1 > 0.5 # The prediction thresholded

xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function

cost = xent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize

gw, gb = T.grad(cost, [w, b]) # Compute the gradient of the cost

# w.r.t weight vector w and bias term b (we shall return to this in a following section of this tutorial)

# Compile

train = theano.function( inputs=[x,y], outputs=[prediction, xent], updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))

predict = theano.function(inputs=[x], outputs=prediction)

# Train

for i in range(training_steps):

pred, err = train(D[0], D[1])

print("Final model:")

print(w.get_value())

print(b.get_value())

print("target values for D:")

print(D[1])

print("prediction on D:")

print(predict(D[0]))

关于scan：不太好理解

大概参数说明

函数scan调用的一般形式的一个例子大概是这样：

results, updates = theano.scan(

fn = lambda y, p, x_tm2, x_tm1,A: y+p+x_tm2+xtm1+A,sequences=[Y, P[::-1]], outputs_info=[dict(initial=X, taps=[-2, -1])]),non_sequences=A)

参数fn是一个你需要计算的函数，一般用lambda来定义，参数是有顺序要求的，先是sequances的参数(y,p)，然后是output_info的参数(x_tm2,x_tm1)，然后是no_sequences的参数(A)。
sequences就是需要迭代的序列，序列的第一个维度(leading dimension)就是需要迭代的次数。所以，Y和P[::-1]的第一维大小应该相同，如果不同的话，就会取最小的。
outputs_info描述了需要用到前几次迭代输出的结果，dict(initial=X, taps=[-2, -1])表示使用前一次和前两次输出的结果。如果当前迭代输出为x(t)，则计算中使用了(x(t-1)和x(t-2)。
non_sequences描述了非序列的输入，即A是一个固定的输入，每次迭代加的A都是相同的。如果Y是一个向量，A就是一个常数，总之，A比Y少一个维度。

官网在引入scan时引入两个例子，计算雅各比矩阵和海森矩阵：

theano.gradient.jacobian()：

>>> import theano

>>> import theano.tensor as T

>>> x = T.dvector('x')

>>> y = x ** 2

>>> J, updates = theano.scan(lambda i, y,x : T.grad(y[i], x), sequences=T.arange(y.shape[0]), non_sequences=[y,x])

>>> f = theano.function([x], J, updates=updates)

>>> f([4, 4])

array([[ 8., 0.],

[ 0., 8.]])

theano.gradient.hessian()

>>> x = T.dvector('x')

>>> y = x ** 2

>>> cost = y.sum()

>>> gy = T.grad(cost, x)

>>> H, updates = theano.scan(lambda i, gy,x : T.grad(gy[i], x), sequences=T.arange(gy.shape[0]), non_sequences=[gy, x])

>>> f = theano.function([x], H, updates=updates)

>>> f([4, 4])

array([[ 2., 0.],

[ 0., 2.]])

Seeding Stream、Sharing Streams Between Functions、Copying Random State Between Theano Graphs

theano使用的更多相关文章

Deconvolution Using Theano
Transposed Convolution, 也叫Fractional Strided Convolution, 或者流行的(错误)称谓: 反卷积, Deconvolution. 定义请参考tuto ...
Theano printing
Theano printing To visualize the internal relation graph of theano variables. Installing conda insta ...
Theano Graph Structure
Graph Structure Graph Definition theano's symbolic mathematical computation, which is composed of: A ...
Theano Inplace
Theano Inplace inplace Computation computation that destroy their inputs as a side-effect. Example i ...
broadcasting Theano vs. Numpy
broadcasting Theano vs. Numpy broadcast mechanism allows a scalar may be added to a matrix, a vector ...
theano scan optimization
selected from Theano Doc Optimizing Scan performance Minimizing Scan Usage performan as much of the ...
theano sparse_block_dot
theano 中的一个函数 sparse_block_dot; Function: for b in range(batch_size): for j in range(o.shape[1]): fo ...
ubuntu系统theano和keras的安装
说明:系统是unbuntu14.04LTS,32位的操作系统,以前安装了python3.4,现在想要安装theano和keras.步骤如下: 1,安装pip sudo apt-get install ...
theano学习
import numpy import theano.tensor as T from theano import function x = T.dscalar('x') y = T.dscalar( ...
Theano 学习笔记(一)
Theano 学习笔记(一) theano 为什么要定义共享变量? 定义共享变量的原因在于GPU的使用,如果不定义共享的话,那么当GPU调用这些变量时,遇到一次就要调用一次,这样就会花费大量时间在数据 ...

随机推荐

Maven 项目生成或者update jdk变为1.5的问题
在使用Maven构建项目时,生成的maven项目jdk默认使用的是jdk1.5. 在手动修改了jdk之后,update project之后jdk又会变为1.5. 或者用eclipse的Maven插件生 ...
webpack详细配置解析
阅读本文之前,先看下面这个webpack的配置文件,如果每一项你都懂,那本文能带给你的收获也许就比较有限,你可以快速浏览或直接跳过:如果你和十天前的我一样,对很多选项存在着疑惑,那花一段时间慢慢阅读本 ...
软件工程——HelloWorld
#include main(){ printf("Hello World\n"); }
windows、ubuntu、centos7下mysql 的安装与使用
一.windows 及ubuntu下安装 windows可以傻瓜式安装,另一种空闲了下来写,也不麻烦 ubuntu: apt-get install mysql 强烈推荐使用ubuntu从这儿就很方便 ...
Activiti 用户手册
https://tkjohn.github.io/activiti-userguide/
[转帖]Gartner预测2019年全球IT支出将达到3.8万亿美元
Gartner预测2019年全球IT支出将达到3.8万亿美元 http://server.zhiding.cn/server/2019/0130/3115439.shtml 全球领先的信息技术研究和顾 ...
modern effective C++ -- Deducint Types
1. 理解模板类型推导 1. expr是T& template<typename T> void f(T & param); // 我们声明如下变量 int x = 27; ...
vue-cli webpack 全局引用jquery
一.初始化项目首先,执行vue init webpack F:\ZhaoblTFS\Zeroes\Document\代码示例\vue-cli-webpack-jquery>vue init w ...
ItemsControl的两种数据绑定方式
最近在学习ItemsControl这个控件的时候,查看了MSDN上面的一个例子,并且自己做了一些修改,这里主要使用了两种方式来进行相应的数据绑定,一种是使用DataContext,另外一种是直接将一个 ...
js 算數（Math）對象
算數對象不需要聲明,可以直接使用, Math對象方法及作用: round()四捨五入: random()生成0到1的隨機數: max()選擇較大的數: min()返回較小的數: