NLP（十九）双向LSTM情感分类模型

原文链接：http://www.one2know.cn/nlp19/

使用IMDB情绪数据来比较CNN和RNN两种方法，预处理与上节相同

from __future__ import print_function

import numpy as np

import pandas as pd

from keras.preprocessing import sequence

from keras.models import Sequential

from keras.layers import Dense,Dropout,Embedding,LSTM,Bidirectional

from keras.datasets import imdb

from sklearn.metrics import accuracy_score,classification_report

# 限制最大的特征数

max_features = 15000

max_len = 300

batch_size = 64

# 加载数据

(x_train,y_train),(x_test,y_test) = imdb.load_data(num_words=max_features)

print(len(x_train),'train observations')

print(len(x_test),'test observations')

输出：

Using TensorFlow backend.

25000 train observations

25000 test observations

如何实现

1.预处理

2.LSTM模型的构建和验证

3.模型评估
代码

from __future__ import print_function

import numpy as np

import pandas as pd

from keras.preprocessing import sequence

from keras.models import Sequential

from keras.layers import Dense,Dropout,Embedding,LSTM,Bidirectional

from keras.datasets import imdb

from sklearn.metrics import accuracy_score,classification_report

# 限制最大的特征数

max_features = 15000

max_len = 300

batch_size = 64

# 加载数据

(x_train,y_train),(x_test,y_test) = imdb.load_data(num_words=max_features)

# print(len(x_train),'train observations')

# print(len(x_test),'test observations')

# 通过序列填充将所有的数据整合为一个固定维度，提高运行效率

x_train_2 = sequence.pad_sequences(x_train,maxlen=max_len)

x_test_2 = sequence.pad_sequences(x_test,maxlen=max_len)

print('x_train_2.shape:',x_train_2.shape)

print('x_test_2.shape:',x_test_2.shape)

y_train = np.array(y_train)

y_test = np.array(y_test)

# keras框架 => 双向LSTM模型

# 双向LSTM网络有前向和后向连接，使句子中的单词可以同时与左右词汇产生连接

model = Sequential()

model.add(Embedding(max_features,128,input_length=max_len)) # 嵌入层将维数降到128

model.add(Bidirectional(LSTM(64))) # 双向LSTM层

model.add(Dropout(0.5)) # 随机失活

model.add(Dense(1,activation='sigmoid')) # 稠密层 将情感分类0或1

model.compile('adam','binary_crossentropy',metrics=['accuracy']) # 二元交叉熵

print(model.summary())

model.fit(x_train_2,y_train,batch_size=batch_size,epochs=4,validation_split=0.2)

# 预测及结果

y_train_predclass = model.predict_classes(x_train_2,batch_size=1000)

y_test_predclass = model.predict_classes(x_test_2,batch_size=1000)

y_train_predclass.shape = y_train.shape

y_test_predclass.shape = y_test.shape

print('\n\nLSTM Bidirectional Sentiment Classification - Train accuracy:',

      round(accuracy_score(y_train,y_train_predclass),3))

print('\nLSTM Bidirectional Sentiment Classification of Training data\n',

      classification_report(y_train,y_train_predclass))

print('\nLSTM Bidirectional Sentiment Classification - Train Confusion Matrix\n\n',

      pd.crosstab(y_train,y_train_predclass,rownames=['Actuall'],colnames=['Predicted']))

print('\nLSTM Bidirectional Sentiment Classification - Test accuracy:',

      round(accuracy_score(y_test,y_test_predclass),3))

print('\nLSTM Bidirectional Sentiment Classification of Test data\n',

      classification_report(y_test,y_test_predclass))

print('\nLSTM Bidirectional Sentiment Classification - Test Confusion Matrix\n\n',

      pd.crosstab(y_test,y_test_predclass,rownames=['Actuall'],colnames=['Predicted']))

输出：

Using TensorFlow backend.

x_train_2.shape: (25000, 300)

x_test_2.shape: (25000, 300)

WARNING:tensorflow:From D:\Python37\Lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.

Instructions for updating:

Colocations handled automatically by placer.

WARNING:tensorflow:From D:\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.

Instructions for updating:

Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.

_________________________________________________________________

Layer (type)                 Output Shape              Param #

=================================================================

embedding_1 (Embedding)      (None, 300, 128)          1920000

_________________________________________________________________

bidirectional_1 (Bidirection (None, 128)               98816

_________________________________________________________________

dropout_1 (Dropout)          (None, 128)               0

_________________________________________________________________

dense_1 (Dense)              (None, 1)                 129

=================================================================

Total params: 2,018,945

Trainable params: 2,018,945

Non-trainable params: 0

_________________________________________________________________

None

WARNING:tensorflow:From D:\Python37\Lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.

Instructions for updating:

Use tf.cast instead.

Train on 20000 samples, validate on 5000 samples

Epoch 1/4

2019-07-07 20:03:45.649853: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2

   64/20000 [..............................] - ETA: 18:21 - loss: 0.6915 - acc: 0.5781

  128/20000 [..............................] - ETA: 13:04 - loss: 0.6918 - acc: 0.5938

  192/20000 [..............................] - ETA: 11:14 - loss: 0.6915 - acc: 0.5729

  256/20000 [..............................] - ETA: 10:19 - loss: 0.6917 - acc: 0.5469

  320/20000 [..............................] - ETA: 9:45 - loss: 0.6915 - acc: 0.5469

  此处省略一堆epoch的一堆操作

LSTM Bidirectional Sentiment Classification - Train accuracy: 0.955

LSTM Bidirectional Sentiment Classification of Training data

               precision    recall  f1-score   support

           0       0.96      0.95      0.95     12500

           1       0.95      0.96      0.95     12500

    accuracy                           0.95     25000

   macro avg       0.95      0.95      0.95     25000

weighted avg       0.95      0.95      0.95     25000

LSTM Bidirectional Sentiment Classification - Train Confusion Matrix

 Predicted      0      1

Actuall

0          11928    572

1            561  11939

LSTM Bidirectional Sentiment Classification - Test accuracy: 0.859

LSTM Bidirectional Sentiment Classification of Test data

               precision    recall  f1-score   support

           0       0.86      0.86      0.86     12500

           1       0.86      0.85      0.86     12500

    accuracy                           0.86     25000

   macro avg       0.86      0.86      0.86     25000

weighted avg       0.86      0.86      0.86     25000

LSTM Bidirectional Sentiment Classification - Test Confusion Matrix

 Predicted      0      1

Actuall

0          10809   1691

1           1829  10671

time============== 2080.618681907654

NLP（十九）双向LSTM情感分类模型的更多相关文章

NLP学习（2）----文本分类模型
实战:https://github.com/jiangxinyang227/NLP-Project 一.简介: 1.传统的文本分类方法:[人工特征工程+浅层分类模型] (1)文本预处理: ①(中文) ...
pytorch LSTM情感分类全部代码
先运行main.py进行文本序列化,再train.py模型训练 dataset.py from torch.utils.data import DataLoader,Dataset import to ...
tensorflow学习笔记(三十九):双向rnn
tensorflow 双向 rnn 如何在tensorflow中实现双向rnn 单层双向rnn 单层双向rnn (cs224d) tensorflow中已经提供了双向rnn的接口,它就是tf.nn.b ...
Python之路【第二十九篇】:django ORM模型层
ORM简介 MVC或者MVC框架中包括一个重要的部分,就是ORM,它实现了数据模型与数据库的解耦,即数据模型的设计不需要依赖于特定的数据库,通过简单的配置就可以轻松更换数据库,这极大的减轻了开发人员的 ...
PaddlePaddle︱开发文档中学习情感分类（CNN、LSTM、双向LSTM）、语义角色标注
PaddlePaddle出教程啦,教程一部分写的很详细,值得学习. 一期涉及新手入门.识别数字.图像分类.词向量.情感分析.语义角色标注.机器翻译.个性化推荐. 二期会有更多的图像内容. 随便,帮国产 ...
[DeeplearningAI笔记]序列模型2.9情感分类
5.2自然语言处理觉得有用的话,欢迎一起讨论相互学习~Follow Me 2.9 Sentiment classification 情感分类情感分类任务简单来说是看一段文本,然后分辨这个人是否喜欢 ...
kaggle——Bag of Words Meets Bags of Popcorn（IMDB电影评论情感分类实践）
kaggle链接:https://www.kaggle.com/c/word2vec-nlp-tutorial/overview 简介:给出 50,000 IMDB movie reviews,进行0 ...
基于双向LSTM和迁移学习的seq2seq核心实体识别
http://spaces.ac.cn/archives/3942/ 暑假期间做了一下百度和西安交大联合举办的核心实体识别竞赛,最终的结果还不错,遂记录一下.模型的效果不是最好的,但是胜在“端到端”, ...
NLP文本情感分类传统模型+深度学习（demo）
文本情感分类: 文本情感分类(一):传统模型摘自:http://spaces.ac.cn/index.php/archives/3360/ 测试句子:工信处女干事每月经过下属科室都要亲口交代24口交 ...

随机推荐

单页面(如react，vue)网站的服务器渲染 SSR 之 SEO 大杀器 Rendertron
单页面网站,比如vue.recat框架的网站,一般都是直接从服务器推送index.html,再根据自身路由通过js在客户端浏览器渲染出完整的html页面. 但是搜索引擎的爬虫可没有这么智能(实际上go ...
Pivotal：15分钟部署你的应用
“ 本篇文章介绍的是PaaS平台Pivotal Cloud Foundry(以下简称PCF)的初步使用,相比于传统的IaaS平台(比如阿里云),PCF可实现快速迭代开发与部署,让您专注于业务开发.” ...
Oracle 数据库登录、用户解锁、改密码、创建用户授权操作
一.数据库登录1.常用账户: 管理员: sys主要练习操作用户: scott2.测试环境是否配置成功: 1.命令窗口 win+R -> cmd(以管理员身份运行) - > sqlplus ...
Asp.Net MVC HttpPost用法
一个Action只能用一个http 特性,例如:HttpPost 不能与HttpGet 或者多个HttpPost重复使用,否则会出错也可以用 [AcceptVerbs("put" ...
客户端埋点实时OLAP指标计算方案
背景产品经理想要实时查询一些指标数据,在新版本的APP上线之后,我们APP的一些质量指标,比如课堂连接掉线率,课堂内崩溃率,APP崩溃率等指标,以此来看APP升级之后上课的体验是否有所提升,上课质量 ...
manifest.json 解析--手机web app开发笔记（三-2)
四.SDK配置和模块权限配置 SDK 就是 Software Development Kit 的缩写,中文意思就是“软件开发工具包”,也就是辅助开发某一类软件的相关文档.范例和工具的集合都可以叫做“S ...
算法与数据结构基础 - 滑动窗口(Sliding Window)
滑动窗口基础滑动窗口常用来解决求字符串子串问题,借助map和计数器,其能在O(n)时间复杂度求子串问题.滑动窗口和双指针(Two pointers)有些类似,可以理解为往同一个方向走的双指针.常用滑 ...
viewpager_轮播
public class MainActivity extends Activity { private ViewPager pager; private int[] id={R.layout.lay ...
『深度应用』一小时教你上手MaskRCNN·Keras开源实战（Windows&Linux）
0. 前言介绍开源地址:https://github.com/matterport/Mask_RCNN 个人主页:http://www.yansongsong.cn/ MaskRCNN是何凯明基于以 ...
python学习之并发编程(理论部分)
第一章操作系统管理控制协调计算机中硬件与软件的关系. 操作系统的作用? 第一个作用: 将一些对硬件操作的复杂丑陋的接口,变成简单美丽的接口. open函数. 第二个作用: 多个进程抢占一个(CPU ...

NLP（十九） 双向LSTM情感分类模型

NLP（十九） 双向LSTM情感分类模型的更多相关文章

随机推荐

热门专题

NLP（十九）双向LSTM情感分类模型

NLP（十九）双向LSTM情感分类模型的更多相关文章