Tensorflow多层LSTM代码分析

1.tf.Graph()

你一旦开始你的任务，就已经有一个默认的图已经创建好了。而且可以通过调用tf.get_default_graph()来访问到。
添加一个操作到默认的图里面，只要简单的调用一个定义了新操作的函数就行。比如下面的例子展示的：

import tensorflow as tf

import numpy as np

c=tf.constant(value=1)

print(c.graph)

print(tf.get_default_graph())

<tensorflow.python.framework.ops.Graph object at 0x107d38fd0>
<tensorflow.python.framework.ops.Graph object at 0x107d38fd0>

另外一种典型的用法就是要使用到Graph.as_default() 的上下文管理器（ context manager），它能够在这个上下文里面覆盖默认的图。如下例:

import tensorflow as tf

import numpy as np

c=tf.constant(value=1)

print(c.graph)

print(tf.get_default_graph())

g=tf.Graph()

print("g:",g)

with g.as_default():

    d=tf.constant(value=2)

    print(d.graph)

    print(g)

<tensorflow.python.framework.ops.Graph object at 0x10b0e56d8>

<tensorflow.python.framework.ops.Graph object at 0x10b0e56d8>

g: <tensorflow.python.framework.ops.Graph object at 0x10b0df2e8>

<tensorflow.python.framework.ops.Graph object at 0x10b0df2e8>

<tensorflow.python.framework.ops.Graph object at 0x10b0df2e8>

2.tf.variable_scope()

利用TensorFlow 提供了变量作用域 机制，当构建一个视图时,很容易就可以共享命名过的变量.

变量作用域机制在TensorFlow中主要由两部分组成：

tf.get_variable(<name>, <shape>, <initializer>): 通过所给的名字创建或是返回一个变量.
tf.variable_scope(<scope_name>): 通过 tf.get_variable()为变量名指定命名空间.

tf.nn.rnn_cell.BasicLSTMCell(num_units=,forget_bias=,state_is_tuple=)

#继承了RNNCell,state_is_tuple官方建议设置为True,此时输入和输出的states为c

#(cell状态)和h(输出)的二元组

#输入输出和cell的维度相同,都是batch_size*num_units

4.

lstm_cell=tf.nn.rnn_cell.DropoutWrapper(lstm_cell,output_keep_prob=config.keep_prob)

#对于rnn的部分不进行dropout，也就是从t-1时候的状态传递到t时刻进行计算时，这个中间不进行

#memory的dropout，仅在同一个t时刻，多层cell之间传递信息的时候进行dropout

cell=tf.nn.rnn_cell.MultiRNNCell([lstm_cell]*config.num_layers,state_is_tuple=True)

#多层lstm cell堆叠起来

tensorflow并不是简单的堆叠了多了single cell，而是将这些cell stack之后当成了一个完整的独立的cell，每个小cell的中间状态还是保存下来了，按照n_tuple存储，但是输出output只用最后那个cell的输出。

这样就定义好了每个t时刻的整体cell，接下来只要每个时刻传入不同的输入，再在时间上展开，就可以得到多个时间上的unroll graph

6.

self._inital_state=cell.zero_state(batch_size,data_type())

#我们刚刚定义好的cell会依次接收num_steps个输入然后产生最后的state(n-tuple,n表示堆叠的层数)

#最后需要[batch_size,堆叠的层数]来存储seq的状态

7.tf.Variable & tf.get_variable()

使用tf.Variable时，如果检测到命名冲突，系统会自己处理。使用tf.get_variable()时，系统不会处理冲突，而会报错。

所以当我们需要共享变量的时候，使用tf.get_variable()

由于tf.Variable() 每次都在创建新对象，所有reuse=True 和它并没有什么关系。对于get_variable()，来说，如果已经创建的变量对象，就把那个对象返回，如果没有创建变量对象的话，就创建一个新的。

8.gradient clipping（修剪）的引入是为了处理gradient explosion或者gradient vanishing的问题，让权重的更新限制在一个合适的范围。

具体的实现细节：

在solver中先设置一个clip_gradient
在前向传播与反向传播之后，我们会得到每个权重的梯度diff，这时不像通常那样直接使用这些梯度进行权重更新，而是先求所有权重梯度的平方和再求根号sumsq_diff，如果sumsq_diff > clip_gradient，则求缩放因子scale_factor = clip_gradient / sumsq_diff。这个scale_factor在(0,1)之间。
最后将所有的权重梯度乘以这个缩放因子，这时得到的梯度才是最后的梯度信息。
这样就保证了在一次迭代更新中，所有权重的梯度的平方和在一个设定范围以内，这个范围就是clip_gradient.

tf.clip_by_global_norm(t_list, clip_norm, use_norm=None, name=None)

t_list是梯度张量，clip_norm是截取的比率，和上面的clip_gradient是相同的东西，返回截取过后的梯度张量和一个所有张量的全局范数。

t_list的更新公式是：

t_list[i] * clip_norm / max(global_norm, clip_norm)

global_norm是所有梯度的平方和再求根号。

from __future__ import absolute_import

from __future__ import division

from __future__ import print_function

import tensorflow as tf

import numpy as np

import math

import gzip

import os

import tempfile

import time

flags=tf.app.flags

logging=tf.logging

flags.DEFINE_string(#这里定义model的值是small

    'model','small','A type of model.Possible options are:small,medium,large.'

    )

flags.DEFINE_string('data_path','/Users/guoym/Desktop/modles-master','data_path')

flags.DEFINE_bool('use_fp16',False,'Train using 16-bit floats instead oof 32bit floats')

FLAGS=flags.FLAGS

def data_type():

    return tf.float16 if FLAGS.use_fp16 else tf.float32

class PTBModel(object):

    def __init__(self,is_training,config):

        '''

        参数is_training:是否要进行训练，如果为False，则不会进行参数的修正

        '''

        self.batch_size = batch_size = config.batch_size

        self.num_steps = num_steps = config.num_steps

        size = config.hidden_size

        vocab_size = config.vocab_size

        self._input_data = tf.placeholder(tf.int32, [batch_size, num_steps])    # 输入

        self._targets = tf.placeholder(tf.int32, [batch_size, num_steps])       # 预期输出，两者都是index序列，长度为num_step

        lstm_cell=tf.nn.rnn_cell.BasicLSTMCell(size,forget_bias=0.0,state_is_tuple=True)

        #num_units是指LSTM cell中的单元的数量

        if is_training and keep_prob<1:#在外面包裹一层dropout

            lstm_cell=tf.nn.rnn_cell.DropoutWrapper(lstm_cell,output_keep_prob=config.keep_prob)

            #对于rnn的部分不进行dropout，也就是从t-1时候的状态传递到t时刻进行计算时，这个中间不进行

            #memory的dropout，仅在同一个t时刻，多层cell之间传递信息的时候进行dropout

        cell=tf.nn.rnn_cell.MultiRNNCell([lstm_cell]*config.num_layers,state_is_tuple=True)

        #多层lstm cell堆叠起来

        self._inital_state=cell.zero_state(batch_size,data_type())

        #我们刚刚定义好的cell会依次接收num_steps个输入然后产生最后的state(n-tuple,n表示堆叠的层数)

        #最后需要[batch_size,堆叠的层数]来存储seq的状态

        with tf.device("/cpu:0"):

            embedding=tf.get_variable("embedding",[vocab_size,size],dtype=data_type())

            #将输入序列用embedding表示 shape=[batch,steps,hidden_size]

            inputs=tf.nn.embedding_lookup(embedding,self._input_data)

        if is_training and config.keep_prob<1:

            inputs=tf.nn.dropout(inputs,keep_prob)

        outputs=[]

        state=self._initial_state#state表示各个batch中的状态

        with tf.variable_scope("RNN"):

            for time_step in range(num_steps):

                if time_step>0:

                    tf.get_variable_scope().reuse_variables

                    #当前变量作用域可以用tf.get_variable_scope()进行检索并且reuse 标签可以通过调用tf.get_variable_scope().reuse_variables()设置为True .

                    (cell_output,state)=cell(inputs[:,time_step,:],state)

                    #cell_output 是[batch,hidden_size]

                    outputs.append(cell_output)

        #把之前的list展开，把[batch,hidden_size*num_steps] reshape 成[batch*numsteps,hiddensize]

        output=tf.reshape(tf.concat(1,outputs),[-1,size])

        softmax_w=tf.get_variable('softmax_w',[size,vocab_size],dtype=data_type())

        softmax_b=tf.get_variable('softmax_b',[vocab_size],dtype=data_type())

        logits=tf.matmul(output,softmax_w)+softmax_b

        loss=tf.nn.seq2seq.sequence_loss_by_example(

            [logits],

            [tf.reshape(self._targets,[-1])],

            [tf.ones([batch_size*num_steps],dtype=data_type())])#展开成为一维的列表

        self._cost=cost=tf.reduce_sum(loss)/batch_size#计算得到每批次的误差

        self._final_state=state

        #logits是一个二维的张量，a*btargets就是一个一维的张量，长度为a，并且targets中的元素是不能

        #超过b的整形,weights是一个一维的长度为a的张量。

        #其意义就是针对logits中的每一个num_step，即[batch,vocab_size]，对所有vocab_size个预测结果，

        #得出预测值最大的那个类别，与targets中的值相比较计算loss值

        if not is_training:

            return

        self._lr=tf.Variable(0.0,trainable=True)

        tvars=tf.trainable_variables()#返回的是需要训练的张量的列表

        grads,_=tf.clip_by_global_norm(tf.gradient(cost,tvars),config.max_grad_norm)

        optimizer=tf.train.GradientDescentOptimizer(self._lr)

        self._train_op=optimizer.apply_gradients(zip(grads,tvars))#将梯度应用于变量

        self._new_lr=tf.placeholder(f.float32,shape=[],name='new_learning_rate')

        #用于外部向graph输入新的lr的值

        self._lr_update=tf.assign(self._lr,self._new_lr)

        def assign_lr(self,session,lr_value):

            #使用session来调用lr_update操作

            session.run(self._lr_update,feed_dict={self._new_lr:lr_value})

    @property

    def input_data(self):

        return self._input_data

    @property

    def targets(self):

        return self._targets

    @property

    def initial_state(self):

        return self._initial_state

    @property

    def cost(self):

        return self._cost

    @property

    def final_state(self):

        return self._final_state

    @property

    def lr(self):

        return self._lr

    @property

    def train_op(self):

        return self._train_op

def run_epoch(session,model,data,eval_op,verbose=False):

    #epoch_size表示批次的总数，也就是说，需要向session喂这么多次的数据

    epoch_size=((len(data)//model.batch_size-1)//model.num_steps)#//表示整数除法

    start_time=time.time()

    costs=0.0

    iters=0

    state=session.run(model.initial_state)

    for step,(x,y) in enumerate(reader.ptb_iterator(data,model.batch_size,model.num_steps)):

        fetchs=[model.cost,model.final_state,eval_op]#要进行的操作，注意训练时和其他时候的eval_op的区别

        feed_dict={}

        feed_dict[model.input_data]=x

        feed_dict[model.targets]=y

        for i ,(c,h) in enumerate(model.initial_state):

            feed_dict[c] = state[i].c

            feed_dict[h] = state[i].h

           cost,state=session.run(fetch,feed_dict)

           costs+=cost

           iters+=model.num_steps

           if verbose and step % (epoch_size // 10) == 10:  # 也就是每个epoch要输出10个perplexity值

            print("%.3f perplexity: %.3f speed: %.0f wps" %

                  (step * 1.0 / epoch_size, np.exp(costs / iters),

                   iters * model.batch_size / (time.time() - start_time)))

    return np.exp(costs / iters)

class SmallConfig(object):

    init_scale = 0.1        #

    learning_rate = 1.0     # 学习速率

    max_grad_norm = 5       # 用于控制梯度膨胀，

    num_layers = 2          # lstm层数

    num_steps = 20          # 单个数据中，序列的长度。

    hidden_size = 200       # 隐藏层规模

    max_epoch = 4           # epoch<max_epoch时，lr_decay值=1,epoch>max_epoch时,lr_decay逐渐减小

    max_max_epoch = 13      # 指的是整个文本循环13遍。

    keep_prob = 1.0

    lr_decay = 0.5          # 学习速率衰减

    batch_size = 20         # 每批数据的规模，每批有20个。

    vocab_size = 10000      # 词典规模，总共10K个词

if __name__=='__main__':

    raw_data=reader.ptb_raw_data(FLAGS.data_path)

    train_data,valid_data,test_data,_=raw_data

    config=SmallConfig()

    eval_config=SmallConfig()

    eval_config.batch_size=1

    eval_config.num_steps=1

    with tf.Graph().as_default(),tf.Session() as session:

        initializer=tf.random_uniform_initializer(-config.init_scale,config.init_scale)

        #生成均匀分布的随机数，参数minval，maxval

        with tf.variable_scope('model',reuse=None,initializer=initializer):

            m=PTBModel(is_training=True,config=config)#训练模型

        with tf.variable_scope('model',reuse=True,initializer=initializer):#交叉检验和测试模型

            mvalid=PTBModel(is_training=False,config=config)

            mtest=PTBModel(is_training=False,config=eval_config)

        summary_writer = tf.summary.FileWriter('/tmp/lstm_logs',session.graph)

        tf.initialize_all_variables().run()  # 对参数变量初始化

        for i in range(config.max_max_epoch):

            #learning rate衰减

            # 遍数<max epoch时，lr_decay=1l >max_epoch,lr_decay=0.5^(i-max_epoch)

            lr_decay = config.lr_decay ** max(i - config.max_epoch, 0.0)

            m.assign_lr(session, config.learning_rate * lr_decay) # 设置learning rate

            print("Epoch: %d Learning rate: %.3f" % (i + 1, session.run(m.lr)))

            train_perplexity = run_epoch(session, m, train_data, m.train_op,verbose=True) # 训练困惑度

            print("Epoch: %d Train Perplexity: %.3f" % (i + 1, train_perplexity))

            valid_perplexity = run_epoch(session, mvalid, valid_data, tf.no_op()) # 检验困惑度

            print("Epoch: %d Valid Perplexity: %.3f" % (i + 1, valid_perplexity))

        test_perplexity = run_epoch(session, mtest, test_data, tf.no_op())  # 测试困惑度

        print("Test Perplexity: %.3f" % test_perplexity)

Tensorflow多层LSTM代码分析的更多相关文章

tensorflow笔记：多层LSTM代码分析
tensorflow笔记:多层LSTM代码分析标签(空格分隔): tensorflow笔记 tensorflow笔记系列: (一) tensorflow笔记:流程,概念和简单代码注释 (二) ten ...
tensorflow多层CNN代码分析
tf,reshape(tensor,shape,name=None) #其中shape为一个列表形式,特殊的一点是列表中可以存在-1.-1代表的含义是不用我们自己#指定这一维的大小,函数会自动计算,但 ...
tensorflow笔记：多层CNN代码分析
tensorflow笔记系列: (一) tensorflow笔记:流程,概念和简单代码注释 (二) tensorflow笔记:多层CNN代码分析 (三) tensorflow笔记:多层LSTM代码分析 ...
Tensorflow样例代码分析cifar10
github地址:https://github.com/tensorflow/models.git 本文分析tutorial/image/cifar10教程项目的cifar10_input.py代码. ...
tensorflow faster rcnn 代码分析一 demo.py
os.environ["CUDA_VISIBLE_DEVICES"]=2 # 设置使用的GPU tfconfig=tf.ConfigProto(allow_soft_placeme ...
TensorFlow入门（五）多层 LSTM 通俗易懂版
欢迎转载,但请务必注明原文出处及作者信息. @author: huangyongye @creat_date: 2017-03-09 前言: 根据我本人学习 TensorFlow 实现 LSTM 的经 ...
第二十一节，使用TensorFlow实现LSTM和GRU网络
本节主要介绍在TensorFlow中实现LSTM以及GRU网络. 一 LSTM网络 Long Short Term 网络—— 一般就叫做 LSTM ——是一种 RNN 特殊的类型,可以学习长期依赖信息 ...
深度学习原理与框架-递归神经网络-RNN网络基本框架(代码?) 1.rnn.LSTMCell(生成单层LSTM) 2.rnn.DropoutWrapper(对rnn进行dropout操作) 3.tf.contrib.rnn.MultiRNNCell(堆叠多层LSTM) 4.mlstm_cell.zero_state(state初始化) 5.mlstm_cell(进行LSTM求解)
问题:LSTM的输出值output和state是否是一样的 1. rnn.LSTMCell(num_hidden, reuse=tf.get_variable_scope().reuse) # 构建 ...
开源项目kcws代码分析--基于深度学习的分词技术
http://blog.csdn.net/pirage/article/details/53424544 分词原理本小节内容参考待字闺中的两篇博文: 97.5%准确率的深度学习中文分词(字嵌入+Bi ...

随机推荐

Python+requests+unittest+excel实现接口自动化测试框架(摘录)
一.框架结构: 工程目录二.Case文件设计三.基础包 base 3.1 封装get/post请求(runmethon.py) 1 import requests 2 import json 3 ...
win10+MinGw+ffmpeg 编译
一.安装MinGw+msys 下载 mingw-get-setup.exe 并安装,安装完成会弹出以下界面. 选中红色框几个选项,点击Installation->Apply Changes 进行 ...
Centos6 Tengine开启http2传输协议
1.前言最近在优化网站的访问速度,为网站开启http2协议,这个协议有什么优点呢?如下: http2是下一代的传输协议,以后都会普遍用它,是一个趋势. http2有多路复用特性,意思是访问一个域名下 ...
vue自定义长按指令
1.前言在word中,当我们需要删除一大段文本的时候,我们按一下键盘上的退格键,就会删除一个字,当我们长按住退格键时,就会连续不停的删除,这就是键盘按键的长按功能.那么我们也想在网页中让一个按钮也具 ...
P2860（）
题目描述: 为了从F(1≤F≤5000)个草场中的一个走到另一个,贝茜和她的同伴们有时不得不路过一些她们讨厌的可怕的树．奶牛们已经厌倦了被迫走某一条路,所以她们想建一些新路,使每一对草场之间都会至少有 ...
windows备份mysql数据库
1.编写执行文件mysql_backup.bat rem auther:ELSON ZENGrem date:20191104rem mysql backup! @echo off set mysql ...
JVM 中你不得不知的一些参数
有的同学虽然写了一段时间 Java 了,但是对于 JVM 却不太关注.有的同学说,参数都是团队规定好的,部署的时候也不用我动手,关注它有什么用,而且,JVM 这东西,听上去就感觉很神秘很高深的样子,还 ...
linux 设置固定ip和dns
目录 1. centos 1.1 ifconfig 查看网卡名称 1.2 设置固定ip和dns 1.3 重启网络 2. ubuntu 2.1 ifconfig 查看网卡名称 2.2 设置固定ip和dn ...
ARP通信
ARP:地址解析协议,是根据IP地址获取物理地址的一个TCP/IP协议简单介绍ARP通信过程: 1.发送端在与接收端进行数据通信转发时的过程: 发送端与接收端进行数据通信之前,需要先知道对端的MAC ...
可保图片不变形的object-fit
Object-fit 我们有时候浏览一些网站的时候,偶尔会遇到这种情况: 明显它喵的形变了,尤其是这种这么业余的失误,还是出现在一个专门做图片的网站上. 产生这种现象的原因是:图片写了固定的宽高,这 ...

Tensorflow多层LSTM代码分析

Tensorflow多层LSTM代码分析的更多相关文章

随机推荐

热门专题