Tutorials on training the Skip-thoughts vectors for features extraction of sentence.

Tutorials on training the Skip-thoughts vectors for features extraction of sentence.

1. Send emails and download the training dataset.

　　the dataset used in skip_thoughts vectors is from [BookCorpus]: http://yknzhu.wixsite.com/mbweb

　　first, you should send a email to the auther of this paper and ask for the link of this dataset. Then you will download the following files:

　　unzip these files in the current folders.

2. Open and download the tensorflow version code. 　　

　　Do as the following links: https://github.com/tensorflow/models/tree/master/research/skip_thoughts

　　Then, you will see the processing as follows:

　　[Attention] when you install the bazel, you need to install this software, but do not update it. Or, it may shown you some errors in the following operations.

3. Install the packages needed.

Bazel (instructions)
TensorFlow (instructions)
NumPy (instructions)
scikit-learn (instructions)
Natural Language Toolkit (NLTK)
- First install NLTK (instructions)
- Then install the NLTK data (instructions)
gensim (instructions)
- Only required if you will be expanding your vocabulary with the word2vec model.

4. Encoding Sentences :

　　run the following py files.

 from __future__ import absolute_import

 from __future__ import division

 from __future__ import print_function

 import numpy as np

 import os.path

 import scipy.spatial.distance as sd

 from skip_thoughts import configuration

 from skip_thoughts import encoder_manager

 import pdb 

 print("==>> Skip-Thought Vector ")

 # Set paths to the model.

 VOCAB_FILE = "./skip_thoughts/pretrained/skip_thoughts_bi_2017_02_16/vocab.txt"

 EMBEDDING_MATRIX_FILE = "./skip_thoughts/pretrained/skip_thoughts_bi_2017_02_16/embeddings.npy"

 CHECKPOINT_PATH = "./skip_thoughts/model/train/model.ckpt-78581"

 # The following directory should contain files rt-polarity.neg and rt-polarity.pos.

 # Set up the encoder. Here we are using a single unidirectional model.

 # To use a bidirectional model as well, call load_model() again with

 # configuration.model_config(bidirectional_encoder=True) and paths to the

 # bidirectional model's files. The encoder will use the concatenation of

 # all loaded models.

 print("==>> loading the pre-trained models ... ")

 encoder = encoder_manager.EncoderManager()

 encoder.load_model(configuration.model_config(),

                    vocabulary_file=VOCAB_FILE,

                    embedding_matrix_file=EMBEDDING_MATRIX_FILE,

                    checkpoint_path=CHECKPOINT_PATH)

 print("==>> Done !") 

 # Load the movie review dataset.

 data = [' This is my second attempt  to the tensorflow version skip_thought_vectors ... ']

 print("==>> the given sentence is: ", data) 

 # Generate Skip-Thought Vectors for each sentence in the dataset.

 encodings = encoder.encode(data)

 print("==>> the sentence feature is: ", encodings)    ## the output feature is 2400 dimension.

wangxiao@AHU:/media/wangxiao/49cd8079-e619-4e4b-89b1-15c86afb5102/skip_thought_vector_onlineModels$ python run_sentence_feature_extraction.py
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
==>> Skip-Thought Vector
==>> loading the pre-trained models ...
WARNING:tensorflow:From ./skip_thoughts/skip_thoughts_model.py:360: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
2018-05-13 21:36:27.670186: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
==>> Done !
==>> the given sentence is: [' This is my second attempt to the tensorflow version skip_thought_vectors ... ']
==>> the sentence feature is: [[-0.00676637 0.01928637 -0.01759908 ..., 0.00851333 0.00875245 -0.0040213 ]]
> ./run_sentence_feature_extraction.py(48)<module>()
-> print("==>> the encodings[0] is: ", encodings[0])
(Pdb) x = encodings[0]
(Pdb) x.size
2400
(Pdb)

as we can see from above terminal, the output feature vector is 2400-D. 　　

...

Tutorials on training the Skip-thoughts vectors for features extraction of sentence.的更多相关文章

Training Deep Neural Networks
http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html //转载于 Training Deep Neural ...
Computer Vision Tutorials from Conferences (2) -- ECCV
ECCV 2012 (http://eccv2012.unifi.it/program/tutorials/) Vision Applications on Mobile using OpenCVGa ...
awesome-nlp
awesome-nlp A curated list of resources dedicated to Natural Language Processing Maintainers - Keon ...
[C3] Andrew Ng - Neural Networks and Deep Learning
About this Course If you want to break into cutting-edge AI, this course will help you do so. Deep l ...
[C2P2] Andrew Ng - Machine Learning
##Linear Regression with One Variable Linear regression predicts a real-valued output based on an in ...
pytorch做seq2seq注意力模型的翻译
以下是对pytorch 1.0版本的seq2seq+注意力模型做法语--英语翻译的理解(这个代码在pytorch0.4上也可以正常跑): # -*- coding: utf-8 -*- " ...
[C2P3] Andrew Ng - Machine Learning
##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...
论文翻译——Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Abstract Semantic word spaces have been very useful but cannot express the meaning of longer phrases ...
第五章第四周习题： Transformers Architecture with TensorFlow
目录 Transformer Network Packages 1 - Positional Encoding 1.1 - Sine and Cosine Angles Exercise 1 - ge ...

随机推荐

Azure Messaging-ServiceBus Messaging消息队列技术系列2-编程SDK入门
各位,上一篇基本概念和架构中,我们介绍了Window Azure ServiceBus的消息队列技术的概览.接下来,我们进入编程模式和详细功能介绍模式,一点一点把ServiceBus技术研究出来. 本 ...
刨根究底字符编码之—UTF-16编码方式
在网上已经转悠好几天了, 这篇文章让我知道了UTF-16的前世今生, 感谢作者https://cloud.tencent.com/developer/article/1384687 1. UTF-16 ...
javamail邮件Multipart支持同时发text和html混合消息，alternative纯文本与超文本共存
javamail邮件Multipart支持同时发text和html混合消息alternative纯文本与超文本共存 multipart/mixed:附件. multipart/related:内嵌资源 ...
sql之left join、right join、inner join的区别，连接自己时的查询结果测试
sql之left join.right join.inner join的区别 left join(左联接) 返回包括左表中的所有记录和右表中联结字段相等的记录 right join(右联接) 返回包括 ...
51Nod 1049 最大子段和
题目链接:https://www.51nod.com/onlineJudge/questionCode.html#!problemId=1049 #include<iostream> #i ...
bzoj3678 简单题
题目链接 bitset #include<algorithm> #include<iostream> #include<cstdlib> #include<c ...
Hadoop学习笔记之三：DataNode
DataNode对ClientDatanodeProtocol.InterDatanodeProtocol两个协议接口进行了实现,通过ipc::Server向Client.其它DN提供RPC服务(参见 ...
Django里自定义用户登陆及登陆后跳转到登陆前页面的实现
def logout(request): request.session.flush() return HttpResponseRedirect(request.META.get('HTTP_REFE ...
深入理解Word2Vec
Word2Vec Tutorial - The Skip-Gram Model,Skip-Gram模型的实现原理:http://mccormickml.com/2016/04/19/word2vec- ...
oracle merge同时包含增、删、改
原来一直没注意,merge是可以支持delete,只不过必须的是on条件满足,也就是要求系统支持逻辑删除,而非物理删除. Using the DELETE Clause with MERGE Stat ...

Tutorials on training the Skip-thoughts vectors for features extraction of sentence.

Tutorials on training the Skip-thoughts vectors for features extraction of sentence.的更多相关文章

随机推荐

热门专题