import numpy

import theano.tensor as T

from theano import function

x = T.dscalar('x')

y = T.dscalar('y')

z = x + y

f = function([x, y], z)
numpy.allclose(f(16.3, 12.1), 28.4)     输出为true
numpy.allclose(z.eval({x:16.3, y:12.1}, 28.4))    输出为true

tensor:高维数组，T 里面其实有scalar （一个数据点），vector (向量），matrix (矩阵），tensor3 (三维矩阵)，tensor4 （四位矩阵）这些都落入tensor的范畴。

dscalar:不是一个类，是一个TensorVariable实例。特别的，T.dscalar指：doubles(d)型的0维arrays(scalar)。

pp:一个函数，from theano import pp print(pp(z)) 则pretty-print 关于z的计算：输出（x+y）.

以下为具体类型（theano 0.8.2）：

import theano

a = theano.tensor.vector()   # 引入tensor中的vector型

out = a + a**10

f = theano.function([a], out)

print(f([0,1,2]))            # 输出[0.   2. 1026.]

logistics代码：

import theano

import theano.tensor as T

x = T.dmatrix('x')

s = 1/(1 + T.exp(-x))

logistic = theano.function([x], s)

logistic([[0, 1],[-1, -2]])       # 输出array([[0.5         ,0.73105858],
                                               [0.26894142 , 0.11920292]])

一次计算多项：

>>> a, b = T.dmatrices('a', 'b')             # dmatrices 提供多个输出，这是声明多变量的一个捷径

>>> diff = a - b

>>> abs_diff = abs(diff)

>>> diff_squared = diff**2

>>> f = theano.function([a, b], [diff, abs_diff, diff_squared])

>>> f([[1, 1], [1, 1]], [[0, 1], [2, 3]])
[array([[ 1., 0.],
[-1., -2.]]), array([[ 1., 0.],
[ 1., 2.]]), array([[ 1., 0.],
[ 1., 4.]])]

为参数设定默认值，引入function中的参数In

>>> from theano import In

>>> from theano import function

>>> x, y = T.dscalars('x', 'y')

>>> z = x + y

>>> f = function([x, In(y, value=1)], z)          # 引入类In：允许你为函数参数进行更多细节上的特定化

>>> f(33)

array(34.0)

>>> f(33, 2)

array(35.0)

>>> x, y, w = T.dscalars('x', 'y', 'w')

>>> z = (x + y) * w

>>> f = function([x, In(y, value=1), In(w, value=2, name='w_by_name')], z)        # 注意这里引入name

>>> f(33)

array(68.0)

>>> f(33, 2)

array(70.0)

>>> f(33, 0, 1)

array(33.0)

>>> f(33, w_by_name=1)

array(34.0)

>>> f(33, w_by_name=1, y=0)

array(33.0)

利用共享变量（Shared Variables）

例如我们想造一个累加器，开始初始化为0，随着函数每被调用一次，累加器通过函数声明进行叠加。shared函数构造了一个称为 shared vairables的结构，其值被很多函数共享，其值可以通过调用.get_value()来access,通过.set_value()来modified.

另一个说明：在function中引入参数updates .function.updates必须以pairs（shared-variable, new expression）的列表形式提供，当然形式也可以是字典（其键为shared-variables，值为new expression）。顾名思义，update就是用后面的值代替前面的值。

代码：

>>> from theano import shared

>>> state = shared(0)

>>> inc = T.iscalar('inc')

>>> accumulator = function([inc], state, updates=[(state, state+inc)])

>>> print(state.get_value())

0

>>> accumulator(1)

array(0)

>>> print(state.get_value())

1

>>> accumulator(300)

array(1)

>>> print(state.get_value())

301

>>> state.set_value(-1)

>>> accumulator(3)

array(-1)

>>> print(state.get_value())

2                                            # 此时共享变量值为2，注意下文

>>> decrementor = function([inc], state, updates=[(state, state-inc)]) # 定义另一个函数来共享shared variable
>>> decrementor(2) # 给inc赋值为2
array(2) # 此时输出共享变量值还为2，注意上文
>>> print(state.get_value()) # update 将state更新为0
0

利用function中参数givens

givens参数被用来替代任何符号变量，不仅仅是共享变量，你可以用来替代常量，表达式。注意不要引入一个互相依赖的替代品，因为替代者的顺序没有定义，所以他们会以任意顺序工作。实际中，可以将givens看作一种机制：允许你用不同的表示方法（evaluates to a tensor of same shape and dtype，相同的尺寸和类型）替代你的任何公式。

>>> fn_of_state = state * 2 + inc

>>> # The type of foo must match the shared variable we are replacing

>>> # with the ``givens``

>>> foo = T.scalar(dtype=state.dtype)                                                 # 因为下文要用foo代替state，所以要获得相同类型

>>> skip_shared = function([inc, foo], fn_of_state, givens=[(state, foo)])            # 这里用foo代替state！

>>> skip_shared(1, 3) # we're using 3 for the state, not state.value                  # 这里的1 赋值给了inc， 3赋值给了foo， 在计算中，用foo代替了state

array(7)                                                                              # state *2+inc变为 foo *2+inc ，所以为7

>>> print(state.get_value()) # old state still there, but we didn't use it            # state 值没变，所以仍然为0

0

copy 函数

> import theano

>>> import theano.tensor as T

>>> state = theano.shared(0)

>>> inc = T.iscalar('inc')

>>> accumulator = theano.function([inc], state, updates=[(state, state+inc)],on_unused_input='ignore')

>>> accumulator(10)

array(0)

>>> print(state.get_value())

10

>>> new_state = theano.shared(0)

>>> new_accumulator = accumulator.copy(swap={state:new_state})               # 利用swap参数将new_state替代原accumulate中的state

>>> new_accumulator(100)

[array(0)]

>>> print(new_state.get_value())

100

>>> print(state.get_value())                                                 # 原函数中的state值未变

10

>>> null_accumulator = accumulator.copy(delete_updates=True)                 # 再定义一个新的accumulator函数，新函数移除掉了update

>>> null_accumulator(9000)
[array(10)]
>>> print(state.get_value()) # 这个新函数没有了uodates功能，同时也不再使用参数 inc
10 # 如果没有移除updates，则值应该为9010。移除后，只剩state的值

随机数 Random Numbers

from theano.tensor.shared_randomstreams import RandomStreams

from theano import function

srng = RandomStreams(seed=234)

rv_u = srng.uniform((2,2))                        # 服从联合分布（uniform distribution）的2*2的随机矩阵

rv_n = srng.normal((2,2))                         # 服从正态分布（normal distribution）的2*2的随机矩阵

f = function([], rv_u)

g = function([], rv_n, no_default_updates=True) #Not updating rv_n.rng   #不再更新rv_n，即不管调用几次，这个值不变

nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)  # remark：一个随机变量在简单函数里只生成一次，所以这个函数值虽然有三次rv_u，但是函数值应该为零！

>>> f_val0 = f()
>>> f_val1 = f() #different numbers from f_val0 # 两次调用，两种不同结果

>>> g_val0 = g() # different numbers from f_val0 and f_val1
>>> g_val1 = g() # same numbers as g_val0! # 两次调用，两种相同结果

补充：随机抽样（numpy.random）

rand(d0,d1,...,dn) >>>np.random.rand(a,b) a*b矩阵随机值

randn(d0,d1,...,dn) >>>np.random.randn() 返回一个标准正态分布的样本

randint(low[,high,size]) >>>np.random.randint(2, size=10) 1*10维整型数组，最大值小于2 开区间

>>>np.random.randint(size=10, low=0, high=3) 1*10维整型数组，最低可取0，最大不可取3

random_integers(low[,high,size]) >>>np.random.random_integers(5, size=(3.,2.)) 用法同randint， 闭区间

random_sample([size])、random([size])、ranf([size])、sample([size]) 返回半开区间 [0.0， 1.0) 的随机浮点数

choice(a[,size,replace,p]) >>>np.random.choice(5,3) 最大为4，数目为3的一个随机数组

>>>np.random.choice(5,3,p=[0.1, 0, 0.3, 0.6, 0]) Generate a non-uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3, replace=False) array([3,1,0])

Generate a uniform random sample from np.arange(5) of size 3 without replacement

>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0]) array([2, 3, 0])

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement

bytes: 返回随机字节 >>> np.random.bytes(10) ‘ eh\x85\x022SZ\xbf\xa4‘ #random

关于排列：

shuffle(x): 现场修改序列，改变自身内容。（类似洗牌，打乱顺序）

>>> arr = np.arange(10)

>>> np.random.shuffle(arr)

>>> arr

[1 7 5 2 9 4 3 6 0 8]

This function only shuffles the array along the first index of a multi-dimensional array:

>>> arr = np.arange(9).reshape((3, 3))

>>> np.random.shuffle(arr)

>>> arr

array([[3, 4, 5],

       [6, 7, 8],

       [0, 1, 2]])

permutation(x):返回一个随机排列

>>> np.random.permutation(10)

array([1, 7, 4, 3, 0, 9, 2, 5, 8, 6])

>>> np.random.permutation([1, 4, 9, 12, 15])

array([15,  1,  9,  4, 12])

>>> arr = np.arange(9).reshape((3, 3))

>>> np.random.permutation(arr)

array([[6, 7, 8],

       [0, 1, 2],

       [3, 4, 5]])

有了以上知识，理解theano 0.8.2中关于logistics的经典例子不成问题：

import numpy

import theano

import theano.tensor as T

rng = numpy.random

N = 400     # training sample size

feats = 784 # number of input variables

# generate a dataset: D = (input_values, target_class)

D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))

training_steps = 10000

# Declare Theano symbolic variables

x = T.dmatrix("x")

y = T.dvector("y")

# initialize the weight vector w randomly

# this and the following bias variable b

# are shared so they keep their values

# between training iterations (updates)

w = theano.shared(rng.randn(feats), name="w")

# initialize the bias term

b = theano.shared(0., name="b")

print("Initial model:")

print(w.get_value())

print(b.get_value())

# Construct Theano expression graph

p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b)) # Probability that target = 1

prediction = p_1 > 0.5 # The prediction thresholded

xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function

cost = xent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize

gw, gb = T.grad(cost, [w, b]) # Compute the gradient of the cost

# w.r.t weight vector w and bias term b (we shall return to this in a following section of this tutorial)

# Compile

train = theano.function( inputs=[x,y], outputs=[prediction, xent], updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))

predict = theano.function(inputs=[x], outputs=prediction)

# Train

for i in range(training_steps):

pred, err = train(D[0], D[1])

print("Final model:")

print(w.get_value())

print(b.get_value())

print("target values for D:")

print(D[1])

print("prediction on D:")

print(predict(D[0]))

关于scan：不太好理解

大概参数说明

函数scan调用的一般形式的一个例子大概是这样：

results, updates = theano.scan(

fn = lambda y, p, x_tm2, x_tm1,A: y+p+x_tm2+xtm1+A,sequences=[Y, P[::-1]], outputs_info=[dict(initial=X, taps=[-2, -1])]),non_sequences=A)

参数fn是一个你需要计算的函数，一般用lambda来定义，参数是有顺序要求的，先是sequances的参数(y,p)，然后是output_info的参数(x_tm2,x_tm1)，然后是no_sequences的参数(A)。

sequences就是需要迭代的序列，序列的第一个维度(leading dimension)就是需要迭代的次数。所以，Y和P[::-1]的第一维大小应该相同，如果不同的话，就会取最小的。

outputs_info描述了需要用到前几次迭代输出的结果，dict(initial=X, taps=[-2, -1])表示使用前一次和前两次输出的结果。如果当前迭代输出为x(t)，则计算中使用了(x(t-1)和x(t-2)。

non_sequences描述了非序列的输入，即A是一个固定的输入，每次迭代加的A都是相同的。如果Y是一个向量，A就是一个常数，总之，A比Y少一个维度。

官网在引入scan时引入两个例子，计算雅各比矩阵和海森矩阵：

theano.gradient.jacobian()：

>>> import theano

>>> import theano.tensor as T

>>> x = T.dvector('x')

>>> y = x ** 2

>>> J, updates = theano.scan(lambda i, y,x : T.grad(y[i], x), sequences=T.arange(y.shape[0]), non_sequences=[y,x])

>>> f = theano.function([x], J, updates=updates)

>>> f([4, 4])

array([[ 8., 0.],

[ 0., 8.]])

theano.gradient.hessian()

>>> x = T.dvector('x')

>>> y = x ** 2

>>> cost = y.sum()

>>> gy = T.grad(cost, x)

>>> H, updates = theano.scan(lambda i, gy,x : T.grad(gy[i], x), sequences=T.arange(gy.shape[0]), non_sequences=[gy, x])

>>> f = theano.function([x], H, updates=updates)

>>> f([4, 4])

array([[ 2., 0.],

[ 0., 2.]])

Seeding Stream、Sharing Streams Between Functions、Copying Random State Between Theano Graphs

待述

theano学习的更多相关文章

Theano 学习笔记(一)
Theano 学习笔记(一) theano 为什么要定义共享变量? 定义共享变量的原因在于GPU的使用,如果不定义共享的话,那么当GPU调用这些变量时,遇到一次就要调用一次,这样就会花费大量时间在数据 ...
theano学习指南5（翻译）- 降噪自动编码器
降噪自动编码器是经典的自动编码器的一种扩展,它最初被当作深度网络的一个模块使用 [Vincent08].这篇指南中,我们首先也简单的讨论一下自动编码器. 自动编码器文献[Bengio09] 给出了自 ...
IMPLEMENTING A GRU/LSTM RNN WITH PYTHON AND THEANO - 学习笔记
catalogue . 引言 . LSTM NETWORKS . LSTM 的变体 . GRUs (Gated Recurrent Units) . IMPLEMENTATION GRUs 0. 引言 ...
莫烦theano学习自修第九天【过拟合问题与正规化】
如下图所示(回归的过拟合问题):如果机器学习得到的回归为下图中的直线则是比较好的结果,但是如果进一步控制减少误差,导致机器学习到了下图中的曲线,则100%正确的学习了训练数据,看似较好,但是如果换成另 ...
用Theano学习Deep Learning（三）：卷积神经网络
写在前面的废话: 出了托福成绩啦,本人战战兢兢考了个97!成绩好的出乎意料!喜大普奔!撒花庆祝! 傻…………寒假还要怒学一个月刷100庆祝个毛线………… 正题: 题目是CNN,但是CNN的具体原理和之 ...
python 之 theano学习：
(1)theano主要支持符号矩阵表达式 (2)theano与numpy中都有broadcasting:numpy中是动态的,而theano需要在这之前就知道是哪维需要被广播.针对不同类型的数据给出如 ...
Theano学习笔记（二）——逻辑回归函数解析
有了前面的准备,能够用Theano实现一个逻辑回归程序.逻辑回归是典型的有监督学习. 为了形象.这里我们如果分类任务是区分人与狗的照片. 首先是生成随机数对象 importnumpy importth ...
Theano学习笔记（三）——图结构
图结构(Graph Structures)这是理解Theano该基金会的内部运作. Theano编程的核心是用符号占位符把数学关系表示出来. 图结构的组成部分如图实现了这段代码: importthe ...
Theano学习笔记（一）——代数
标量相加 import theano.tensor as T from theano import function x = T.dscalar('x') y = T.dscalar('y') z = ...

随机推荐

windows server 2008 各版本号下载地址（微软官网）
前言: 微软官网上下载系统的镜像文件要远比百度网盘下载起来得更快. Windows Server 2008 32-bit Standard(标准版)
[webpack] 配置react+es6开发环境
写在前面每次开新项目都要重新安装需要的包,简单记录一下. 以下仅包含最简单的功能: 编译react 编译es6 打包src中入口文件index.js至dist webpack配置react+es6开 ...
javascript-style-guide
/* 1.类型types 原始值:存取直接作用于它自身 string number boolean null undefined var foo=1; var bar=foo; bar=9; cons ...
Mysql 中文乱码（Navicat for MySQL）
在使用Navicat for MySQL查看插入数据库的数据时,发现中文显示为乱码.搞了好久,理清思路如下: 确定mysql本身编码设置为utf8(也可以为gbk gb2312等) 用Navicat ...
UBER的故事
今天分享一个很好的视频,19分钟,这个视频讲了过去几年一家伟大公司的成长的思路,这个演讲的PPT很棒,演讲者的语速.语调.表情等也非常适合大家在日常工作中学习. 链接:http://v.youku ...
WPF学习系列绘制旋转的立方体
我是一年经验的web程序员,想学习一下wpf,比较喜欢做项目来学习,所以在网上找了一些项目,分析代码,尽量能够做到自己重新敲出来第一个项目是中间的方块会不停的旋转. 第一步,新建wpf项目第二步 ...
POJ 1966 Cable TV Network
Cable TV Network Time Limit: 1000MS Memory Limit: 30000K Total Submissions: 4702 Accepted: 2173 ...
在页面的el表达式是如何判断null的
<c:if test="${not empty message}"> <div id="message" class="alert ...
CSS实现文字省略
1.首先给用于放文本的标签元素设置一个宽度值,并设置溢出属性overflow为溢出隐藏. width: 245px;/*一定要设置固定宽度*/ overflow: hidden;/*不显示超过对象尺寸 ...
[C#] 日志类
在程序发布到服务器上的时候,不能在像本地执行一样可以调试,在发生错误时候,往往不能很方便的查找错误.将错误信息写入文件是一种比较常用的处理方法.以下是一个日志类,实现以下功能: 1)按日期每天生产不同 ...

theano学习

大概参数说明

theano学习的更多相关文章

随机推荐

热门专题