NLP（二十一）根据已有文本LSTM自动生成文本

原文链接：http://www.one2know.cn/nlp21/

根据已有文本LSTM自动生成文本

原理

与股票预测类似，用前面的n个字符预测下一个字符

https://www.cnblogs.com/peng8098/p/keras_5.html
代码

from __future__ import print_function

import numpy as np

import random

import sys

path = r'shakespeare_final.txt'

text = open(path).read().lower() # 打开文档 读成字符串 然后都变小写

characters = sorted(list(set(text))) # 去掉重复字符 方便下面编码

print('corpus length:',len(text))

print('total chars:',len(characters))

char2indices = dict((c,i) for i,c in enumerate(characters)) # 字符(字母等)=>索引(数字)

indices2char = dict((i,c) for i,c in enumerate(characters)) # 索引(数字)=>字符(字母等)

maxlen = 40 # 40个字符长度预测下一个字符

step = 3 # 一次预测3个

sentences = []

next_chars = []

for i in range(0,len(text)-maxlen,step):

    sentences.append(text[i:i+maxlen])

    next_chars.append(text[i+maxlen])

print('nb sentences:',len(sentences)) # 40个字符串作为特征句子的个数 即训练数据大小

## 构造数据集 类似one-hot编码

X = np.zeros((len(sentences),maxlen,len(characters)),dtype=np.bool)

y = np.zeros((len(sentences),len(characters)),dtype=np.bool)

for i,sentence in enumerate(sentences):

    for t,char in enumerate(sentence):

        X[i,t,char2indices[char]] = 1

    y[i,char2indices[next_chars[i]]] = 1

# 构建神经网路

from keras.models import Sequential

from keras.layers import Dense,LSTM,Activation,Dropout

from keras.optimizers import RMSprop

model = Sequential()

model.add(LSTM(128,input_shape=(maxlen,len(characters))))

model.add(Dense(len(characters)))

model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',optimizer=RMSprop(lr=0.01))

print(model.summary())

def pred_indices(preds,metric=1.0):

    preds = np.asarray(preds).astype('float64')

    preds = np.log(preds) / metric

    exp_preds = np.exp(preds)

    preds = exp_preds / np.sum(exp_preds)

    probs = np.random.multinomial(1,preds,1)

    return np.argmax(probs)

for iteration in range(1,30): # 便于观察每一轮的训练结构

    print('-' * 40)

    print('Iteration',iteration)

    model.fit(X,y,batch_size=128,epochs=1)

    start_index = random.randint(0,len(text)-maxlen-1)

    for diversity in [0.2,0.7,1.2]:

        print('\n----- diversity:',diversity)

        generated = ''

        sentence = text[start_index:start_index+maxlen]

        generated += sentence

        print('----- Generating with seed: "'+sentence+'"')

        sys.stdout.write(generated)

        for i in range(400):

            x = np.zeros((1,maxlen,len(characters)))

            for t,char in enumerate(sentence): # 数字索引=>字母

                x[0,t,char2indices[char]] = 1

            preds = model.predict(x,verbose=0)[0]

            next_index = pred_indices(preds,diversity)

            pred_char = indices2char[next_index]

            generated += pred_char

            sentence = sentence[1:] + pred_char

            sys.stdout.write(pred_char)

            sys.stdout.flush()

        print('\nOne combination completed \n')

输出：

corpus length: 581432

total chars: 61

nb sentences: 193798

Using TensorFlow backend.

WARNING:tensorflow:From D:\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.

Instructions for updating:

Colocations handled automatically by placer.

_________________________________________________________________

Layer (type)                 Output Shape              Param #

=================================================================

lstm_1 (LSTM)                (None, 128)               97280

_________________________________________________________________

dense_1 (Dense)              (None, 61)                7869

_________________________________________________________________

activation_1 (Activation)    (None, 61)                0

=================================================================

Total params: 105,149

Trainable params: 105,149

Non-trainable params: 0

_________________________________________________________________

None

----------------------------------------

Iteration 1

WARNING:tensorflow:From D:\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.

Instructions for updating:

Use tf.cast instead.

Epoch 1/1

2019-07-15 17:04:03.721908: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2

2019-07-15 17:04:04.438003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:

name: GeForce GTX 950M major: 5 minor: 0 memoryClockRate(GHz): 1.124

pciBusID: 0000:01:00.0

totalMemory: 2.00GiB freeMemory: 1.64GiB

2019-07-15 17:04:04.438676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0

2019-07-15 17:04:07.352274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:

2019-07-15 17:04:07.352543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0

2019-07-15 17:04:07.352701: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N

2019-07-15 17:04:07.357455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1386 MB memory) -> physical GPU (device: 0, name: GeForce GTX 950M, pci bus id: 0000:01:00.0, compute capability: 5.0)

2019-07-15 17:04:08.415227: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library cublas64_100.dll locally

   128/193798 [..............................] - ETA: 2:16:56 - loss: 4.1095

   256/193798 [..............................] - ETA: 1:09:23 - loss: 3.6938

   384/193798 [..............................] - ETA: 46:52 - loss: 3.8312

   。。。

NLP（二十一）根据已有文本LSTM自动生成文本的更多相关文章

GZFramwork数据库层《二》单据表增删改查(自动生成单据号码)
运行效果: 使用代码生成器(GZCodeGenerate)生成tb_EmpLeave的Model 生成器源代码下载地址: https://github.com/GarsonZhang/GZCodeGe ...
利用RNN（lstm）生成文本【转】
本文转载自:https://www.jianshu.com/p/1a4f7f5b05ae 致谢以及参考最近在做序列化标注项目,试着理解rnn的设计结构以及tensorflow中的具体实现方法.在知乎 ...
IT轮子系列（二）——mvc API 说明文档的自动生成——Swagger的使用（一）
这篇文章主要介绍如何使用Swashbuckle插件在VS 2013中自动生成MVC API项目的说明文档.为了更好说明的swagger生成,我们从新建一个空API项目开始. 第一步.新建mvc api ...
（二十二）SpringBoot之使用mybatis generator自动生成bean、mapper、mapper xml
一.下载mybatis generator插件二.生成generatorConfig.xml new一个generatorConfig.xml 三.修改generatorConfig.xml 里面的 ...
TensorFlow从1到2（十一）变分自动编码器和图片自动生成
基本概念 "变分自动编码器"(Variational Autoencoders,缩写:VAE)的概念来自Diederik P Kingma和Max Welling的论文<Au ...
selenium如何向ueditor富文本中自动输入文本
1.网上给出的方法在百度的富文本控件ueditor中不起作用切换框架失败 2.利用ueditor的api文档,通过js不使用框架切换即可实现轻松写入 eg:ue.setContent('hello')
（二）一个很好用的自动生成工具——mybatis generator
mybatis generator-自动生成代码准备材料: 一个文件夹,一个数据库的驱动包,mybatis-generator-core-1.3.5.jar,一条生成语句如图:(我用的是derby ...
前端（二十一）—— vue指令：文本类指令、避免页面闪烁、v-bind指令、v-on指令、v-model指令、条件渲染指令、列表渲染指令
文本类指令.v-bind指令.v-on指令.v-model指令.条件渲染指令.列表渲染指令一.文本操作 v-text:文本变量 <p v-text='msg'></p> &l ...
无废话ExtJs 入门教程二十一[继承：Extend]
无废话ExtJs 入门教程二十一[继承:Extend] extjs技术交流,欢迎加群(201926085) 在开发中,我们在使用视图组件时,经常要设置宽度,高度,标题等属性.而这些属性可以通过“继承” ...

随机推荐

CSS和html如何结合起来——选择符及优先级
1.选择符兼容性统配选择符 * 元素选择符 body 类选择符 .class id选择符 #id 包含原则符 p strong (所有 ...
常用GDB命令行调试命令
po po是print-object的简写,可用来打印所有NSObject对象.使用举例如下: (gdb) po self <LauncherViewController: 0x552c570& ...
C++单继承、多继承情况下的虚函数表分析
C++的三大特性之一的多态是基于虚函数实现的,而大部分编译器是采用虚函数表来实现虚函数,虚函数表(VTAB)存在于可执行文件的只读数据段中,指向VTAB的虚表指针(VPTR)是包含在类的每一个实例当中 ...
对Rust所有权、借用及生命周期的理解
Rust的内存管理中涉及所有权.借用与生命周期这三个概念,下面是个人的一点粗浅理解. 一.从内存安全的角度理解Rust中的所有权.借用.生命周期要理解这三个概念,你首要想的是这么做的出发点是什么-- ...
Apple放大绝进行反取证
取证说穿了其实就是攻防,这本是正义与邪恶的对决,亦即执法单位与嫌疑犯两者之间的事,但现实生活中要比这复杂多了. 怎么说呢?举个例子大家便理解了.取证人员费尽心思,用尽各种手法,努力地想要自手机上提取重 ...
四、Python基础(1)
目录四.Python基础(1) 四.Python基础(1) 1.什么是变量? 一种变化的量,量是记录世界上的状态,变指得是这些状态是会变化的. 2.为什么有变量? 因为计算机程序的运行就是一系列状态 ...
Apache 80端口可以访问，8080却不可访问
RT, 记录一下,后面看是否有解决方案.
1.1Django简介和虚拟环境配置
MVC 大部分开发语言中都有MVC框架 MVC框架的核心思想是:解耦降低各功能模块之间的耦合性,方便变更,更容易重构代码,最大程度上实现代码的重用 m表示model,主要用于对数据库层的封装 v表示 ...
大数据学习之旅2——从零开始搭hadoop完全分布式集群
前言本文从零开始搭hadoop完全分布式集群,大概花费了一天的时间边搭边写博客,一步一步完成完成集群配置,所以相信大家按照本文一步一步来完全可以搭建成功.需要注意的是本文限于篇幅和时间的限制,也是为 ...
Hive安装与部署
进去root权限(su) 1.从https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-1.2.2/apache-hive-1.2.2-bin.ta ...

NLP（二十一）根据已有文本LSTM自动生成文本

根据已有文本LSTM自动生成文本

NLP（二十一）根据已有文本LSTM自动生成文本的更多相关文章

随机推荐

热门专题