LSTM 可视化

Visualizing Layer Representations in Neural Networks

Visualizing and interpreting representations learned by machine learning / deep learning algorithms is pretty interesting! As the saying goes — “A picture is worth a thousand words”, the same holds true with visualizations. A lot can be interpreted using the correct tools for visualization. In this post, I will cover some details on visualizing intermediate (hidden) layer features using dimension reduction techniques.

We will work with the IMDB sentiment classification task (25000 training and 25000 test examples). The script to create a simple Bidirectional LSTM model using a dropout and predicting the sentiment (1 for positive and 0 for negative) using sigmoid activation is already provided in the Keras examples here.

Note: If you have doubts on LSTM, please read this excellent blog by Colah.

OK, let’s get started!!

The first step is to build the model and train it. We will use the example code as-is with a minor modification. We will keep the test data aside and use 20% of the training data itself as the validation set. The following part of the code will retrieve the IMDB dataset (from keras.datasets), create the LSTM model and train the model with the training data.

'''
This code snippet is copied from https://github.com/fchollet/keras/blob/master/examples/imdb_bidirectional_lstm.py.
A minor modification done to change the validation data.
'''
from __future__ import print_function
import numpy as np
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, LSTM, Bidirectional
from keras.datasets import imdb max_features = 20000
# cut texts after this number of words
# (among top max_features most common words)
maxlen = 100
batch_size = 32 print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences') print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
y_train = np.array(y_train)
y_test = np.array(y_test) model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(Bidirectional(LSTM(64)))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid')) # try using different optimizers and different optimizer configs
model.compile('adam', 'binary_crossentropy', metrics=['accuracy']) print('Train...')
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=4,
validation_split=0.2)

Now, comes the interesting part! We want to see how has the LSTM been able to learn the representations so as to differentiate between positive IMDB reviews from the negative ones. Obviously, we can get an idea from Precision, Recall and F1-score measures. However, being able to visually see the differences in a low-dimensional space would be much more fun!

In order to obtain the hidden-layer representation, we will first truncate the model at the LSTM layer. Thereafter, we will load the model with the weights that the model has learnt. A better way to do this is create a new model with the same steps (until the layer you want) and load the weights from the model. Layers in Keras models are iterable. The code below shows how you can iterate through the model layers and see the configuration.

for layer in model.layers:
print(layer.name, layer.trainable)
print('Layer Configuration:')
print(layer.get_config(), end='\n{}\n'.format('----'*10))

For example, the bidirectional LSTM layer configuration is the following:

bidirectional_2 True
Layer Configuration:
{'name': 'bidirectional_2', 'trainable': True, 'layer': {'class_name': 'LSTM', 'config': {'name': 'lstm_2', 'trainable': True, 'return_sequences': False, 'go_backwards': False, 'stateful': False, 'unroll': False, 'implementation': 0, 'units': 64, 'activation': 'tanh', 'recurrent_activation': 'hard_sigmoid', 'use_bias': True, 'kernel_initializer': {'class_name': 'VarianceScaling', 'config': {'scale': 1.0, 'mode': 'fan_avg', 'distribution': 'uniform', 'seed': None}}, 'recurrent_initializer': {'class_name': 'Orthogonal', 'config': {'gain': 1.0, 'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'unit_forget_bias': True, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.0, 'recurrent_dropout': 0.0}}, 'merge_mode': 'concat'}

The weights of each layer can be obtained using:

trained_model.layers[i].get_weights()

The code to create the truncated model is given below. First, we create a truncated model. Note that we do model.add(..) only until the Bidirectional LSTM layer. Then we set the weights from the trained model (model). Then, we predict the features for the test instances (x_test).

def create_truncated_model(trained_model):
model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(Bidirectional(LSTM(64)))
for i, layer in enumerate(model.layers):
layer.set_weights(trained_model.layers[i].get_weights())
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
return model truncated_model = create_truncated_model(model)
hidden_features = truncated_model.predict(x_test)

The hidden_features has a shape of (25000, 128) for 25000 instances with 128 dimensions. We get 128 as the dimensionality of LSTM is 64 and there are 2 classes. Hence, 64 X 2 = 128.

Next, we will apply dimensionality reduction to reduce the 128 features to a lower dimension. For visualization, T-SNE (Maaten and Hinton, 2008) has become really popular. However, as per my experience, T-SNE does not scale very well with several features and more than a few thousand instances. Therefore, I decided to first reduce dimensions using Principal Component Analysis (PCA) following by T-SNE to 2d-space.

If you are interested on details about T-SNE, please read this amazing blog.

Combining PCA (from 128 to 20) and T-SNE (from 20 to 2) for dimensionality reduction, here is the code. In this code, we used the PCA results for the first 5000 test instances. You can increase it.

Our PCA variance is ~0.99, which implies that the reduced dimensions do represent the hidden features well (scale is 0 to 1). Please note that running T-SNE will take some time. (So may be you can go grab a cup of coffee.)

I am not aware of faster T-SNE implementations than the one that ships with Scikit-learn package. If you are, please let me know by commenting below.

from sklearn.decomposition import PCA
from sklearn.manifold import TSNE pca = PCA(n_components=20)
pca_result = pca.fit_transform(hidden_features)
print('Variance PCA: {}'.format(np.sum(pca.explained_variance_ratio_)))
##Variance PCA: 0.993621154832802 #Run T-SNE on the PCA features.
tsne = TSNE(n_components=2, verbose = 1)
tsne_results = tsne.fit_transform(pca_result[:5000]

Now that we have the dimensionality reduced features, we will plot. We will label them with their actual classes (0 and 1). Here is the code for visualization.

from keras.utils import np_utils
import matplotlib.pyplot as plt
%matplotlib inline y_test_cat = np_utils.to_categorical(y_test[:5000], num_classes = 2)
color_map = np.argmax(y_test_cat, axis=1)
plt.figure(figsize=(10,10))
for cl in range(2):
indices = np.where(color_map==cl)
indices = indices[0]
plt.scatter(tsne_results[indices,0], tsne_results[indices, 1], label=cl)
plt.legend()
plt.show()
'''
from sklearn.metrics import classification_report
print(classification_report(y_test, y_preds))
precision recall f1-score support
0 0.83 0.85 0.84 12500
1 0.84 0.83 0.84 12500
avg / total 0.84 0.84 0.84 25000
'''

We convert the test class array (y_test) to make it one-hot using the to_categorical function. Then, we create a color map and based on the values of y, plot the reduced dimensions (tsne_results) on the scatter plot.

 

T-SNE visualization of hidden features for LSTM model trained on IMDB sentiment classification dataset

Please note that we reduced y_test_cat to 5000 instances too just like the tsne_results. You can change it and allow it to run longer.

Also, the classification report is shown for all the 25000 test instances. About 84% F1-score with a model trained for just 4 epochs. Cool! Here is the scatter plot we obtained.

As can be seen from the plot, the blue (0 — negative class) is fairly separable from the orange (1-positive class). Obviously, there are certain overlaps and the reason why our F-score is around 84 and not closer to 100 :). Understanding and visualizing the outputs at different layers can help understand which layer is causing major errors in learning representations.

I hope you find this article useful. I would love to hear your comments and thoughts. Also, do share your experiences with visualization.

Also, feel free to get in touch with me via LinkedIn.

 
 
 
 
 

来源: https://becominghuman.ai/visualizing-representations-bd9b62447e38

Visualizing LSTM Layer with t-sne in Neural Networks的更多相关文章

  1. 卷积神经网络用于视觉识别Convolutional Neural Networks for Visual Recognition

    Table of Contents: Architecture Overview ConvNet Layers Convolutional Layer Pooling Layer Normalizat ...

  2. Training (deep) Neural Networks Part: 1

    Training (deep) Neural Networks Part: 1 Nowadays training deep learning models have become extremely ...

  3. Convolutional Neural Networks for Visual Recognition

    http://cs231n.github.io/   里面有很多相当好的文章 http://cs231n.github.io/convolutional-networks/ Table of Cont ...

  4. Visualizing CNN Layer in Keras

    CNN 权重可视化 How convolutional neural networks see the world An exploration of convnet filters with Ker ...

  5. 通过Visualizing Representations来理解Deep Learning、Neural network、以及输入样本自身的高维空间结构

    catalogue . 引言 . Neural Networks Transform Space - 神经网络内部的空间结构 . Understand the data itself by visua ...

  6. 课程五(Sequence Models),第一 周(Recurrent Neural Networks) —— 3.Programming assignments:Jazz improvisation with LSTM

    Improvise a Jazz Solo with an LSTM Network Welcome to your final programming assignment of this week ...

  7. 课程一(Neural Networks and Deep Learning),第三周(Shallow neural networks)—— 3.Programming Assignment : Planar data classification with a hidden layer

    Planar data classification with a hidden layer Welcome to the second programming exercise of the dee ...

  8. Hacker's guide to Neural Networks

    Hacker's guide to Neural Networks Hi there, I'm a CS PhD student at Stanford. I've worked on Deep Le ...

  9. (zhuan) Attention in Long Short-Term Memory Recurrent Neural Networks

    Attention in Long Short-Term Memory Recurrent Neural Networks by Jason Brownlee on June 30, 2017 in  ...

随机推荐

  1. POJ 1050

    #include <stdio.h> #include <string.h> #define mt 101 int main() { int a[mt][mt]; int st ...

  2. Linux CentOS7系统配置nginx服务器

    作为一个以服务器为主要市场的操作系统,主要就是对客户端的请求进行响应,进行处理的.在经历过系统镜像安装和本地配置好ssh功能后,接下来进行服务器的安装,这里我以nginx为主,介绍一下如何安装ngin ...

  3. 微信小程序web-view(webview) 嵌套H5页面 唤起微信支付的实现方案

    场景:小程序页面有一个web-view组件,组件嵌套的H5页面,要唤起微信支付. 先讲一下我的项目,首先我是自己开发的一个H5触屏版的商城系统,里面含有购物车,订单支付等功能.然后刚开始,我们公众号里 ...

  4. (转)Linux 文件目录特殊权限设定(SUID,SGID,SBIT)

    原文:https://blog.csdn.net/leshami/article/details/77184029 Linux文件及目录的权限设定,除了我们孰知的读写执行(rwx)之外,还有一些特殊的 ...

  5. 48位MAC转化为唯一的128位IPV6地址

    根据EUI_64规范,一个MAC地址生成唯一的一个IPV6地址. ①.反转MAC的第七位为1. ②.在24bit后加入FFFE. ③.在最前面加上FE80::. 示例:

  6. log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.o

    上面的报错是在本地java调试(windows) hadoop集群 出现的 解决方案: 在resources文件夹下面创建一个文件log4j.properties(这个其实hadoop安装目录下的 e ...

  7. Disconf 学习系列之全网最详细的最新稳定Disconf 搭建部署(基于Ubuntu14.04 / 16.04)(图文详解)

    不多说直接上干货! https://www.cnblogs.com/wuxiaofeng/p/6882596.html (ubuntu16.04) https://www.cnblogs.com/he ...

  8. fastText 安装

    Windows: https://www.lfd.uci.edu/~gohlke/pythonlibs/#fasttext 下载需要的版本 然后:pip install 文件名

  9. cmake:用add_subdirectory()添加外部项目文件夹

    一般情况下,我们的项目各个子项目都在一个总的项目根目录下,但有的时候,我们需要使用外部的文件夹,怎么办呢? add_subdirectory命令,可以将指定的文件夹加到build任务列表中.下面是将与 ...

  10. Vue双向绑定的关键:Object.defineProperty()

    这个方法了不起啊.vue.js和avalon.js 都是通过它实现双向绑定的.而且Object.observe也被草案发起人撤回了.所以defineProperty更有必要了解一下了. 先上几行代码看 ...