Tutorials on training the Skip-thoughts vectors for features extraction of sentence. 

1. Send emails and download the training dataset. 

  the dataset used in skip_thoughts vectors is from [BookCorpus]: http://yknzhu.wixsite.com/mbweb

  first, you should send a email to the auther of this paper and ask for the link of this dataset. Then you will download the following files:

  

  unzip these files in the current folders.

2. Open and download the tensorflow version code.   

  Do as the following links: https://github.com/tensorflow/models/tree/master/research/skip_thoughts

  Then, you will see the processing as follows:

         

  [Attention]  when you install the bazel, you need to install this software, but do not update it. Or, it may shown you some errors in the following operations.

3. Install the packages needed. 

4. Encoding Sentences :

  run the following py files.

  

 from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import os.path
import scipy.spatial.distance as sd
from skip_thoughts import configuration
from skip_thoughts import encoder_manager
import pdb print("==>> Skip-Thought Vector ") # Set paths to the model.
VOCAB_FILE = "./skip_thoughts/pretrained/skip_thoughts_bi_2017_02_16/vocab.txt"
EMBEDDING_MATRIX_FILE = "./skip_thoughts/pretrained/skip_thoughts_bi_2017_02_16/embeddings.npy"
CHECKPOINT_PATH = "./skip_thoughts/model/train/model.ckpt-78581"
# The following directory should contain files rt-polarity.neg and rt-polarity.pos. # Set up the encoder. Here we are using a single unidirectional model.
# To use a bidirectional model as well, call load_model() again with
# configuration.model_config(bidirectional_encoder=True) and paths to the
# bidirectional model's files. The encoder will use the concatenation of
# all loaded models. print("==>> loading the pre-trained models ... ") encoder = encoder_manager.EncoderManager()
encoder.load_model(configuration.model_config(),
vocabulary_file=VOCAB_FILE,
embedding_matrix_file=EMBEDDING_MATRIX_FILE,
checkpoint_path=CHECKPOINT_PATH) print("==>> Done !") # Load the movie review dataset.
data = [' This is my second attempt to the tensorflow version skip_thought_vectors ... '] print("==>> the given sentence is: ", data) # Generate Skip-Thought Vectors for each sentence in the dataset.
encodings = encoder.encode(data) print("==>> the sentence feature is: ", encodings) ## the output feature is 2400 dimension.

wangxiao@AHU:/media/wangxiao/49cd8079-e619-4e4b-89b1-15c86afb5102/skip_thought_vector_onlineModels$ python run_sentence_feature_extraction.py
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
==>> Skip-Thought Vector
==>> loading the pre-trained models ...
WARNING:tensorflow:From ./skip_thoughts/skip_thoughts_model.py:360: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
2018-05-13 21:36:27.670186: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
==>> Done !
==>> the given sentence is: [' This is my second attempt to the tensorflow version skip_thought_vectors ... ']
==>> the sentence feature is: [[-0.00676637 0.01928637 -0.01759908 ..., 0.00851333 0.00875245 -0.0040213 ]]
> ./run_sentence_feature_extraction.py(48)<module>()
-> print("==>> the encodings[0] is: ", encodings[0])
(Pdb) x = encodings[0]
(Pdb) x.size
2400
(Pdb)

as we can see from above terminal, the output feature vector is 2400-D.   

...

Tutorials on training the Skip-thoughts vectors for features extraction of sentence.的更多相关文章

  1. Training Deep Neural Networks

    http://handong1587.github.io/deep_learning/2015/10/09/training-dnn.html  //转载于 Training Deep Neural ...

  2. Computer Vision Tutorials from Conferences (2) -- ECCV

    ECCV 2012 (http://eccv2012.unifi.it/program/tutorials/) Vision Applications on Mobile using OpenCVGa ...

  3. awesome-nlp

    awesome-nlp  A curated list of resources dedicated to Natural Language Processing Maintainers - Keon ...

  4. [C3] Andrew Ng - Neural Networks and Deep Learning

    About this Course If you want to break into cutting-edge AI, this course will help you do so. Deep l ...

  5. [C2P2] Andrew Ng - Machine Learning

    ##Linear Regression with One Variable Linear regression predicts a real-valued output based on an in ...

  6. pytorch做seq2seq注意力模型的翻译

    以下是对pytorch 1.0版本 的seq2seq+注意力模型做法语--英语翻译的理解(这个代码在pytorch0.4上也可以正常跑): # -*- coding: utf-8 -*- " ...

  7. [C2P3] Andrew Ng - Machine Learning

    ##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...

  8. 论文翻译——Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

    Abstract Semantic word spaces have been very useful but cannot express the meaning of longer phrases ...

  9. 第五章第四周习题: Transformers Architecture with TensorFlow

    目录 Transformer Network Packages 1 - Positional Encoding 1.1 - Sine and Cosine Angles Exercise 1 - ge ...

随机推荐

  1. cvc-complex-type.3.2.2: 元素 'constructor-arg' 中不允许出现属性 'name'

    将版本号改成 3.0 以上的即可.

  2. java实现 HTTP/HTTPS请求绕过证书检测代码实现

    java实现 HTTP/HTTPS请求绕过证书检测代码实现 1.开发需求 需要实现在服务端发起HTTP/HTTPS请求,访问其他程序资源. 2.URLConnection和HTTPClient的比较 ...

  3. Lucene 个人领悟 (二)

    想了想,还是继续写吧,因为,太无聊了,媳妇儿也还有半个小时才下班. 前面拖拖拉拉用了三篇文章来做铺垫,这一篇开始正经搞了啊. 首先,我要加几个链接 http://www.cnblogs.com/xin ...

  4. 51Nod 1256 乘法逆元

    题目链接:https://www.51nod.com/onlineJudge/questionCode.html#!problemId=1256 给出2个数M和N(M < N),且M与N互质,找 ...

  5. jt项目日志查询流程

    jt项目日志查询流程

  6. EL和jstl(概念和使用方法)

    概念: 1 .  JSP 标签 是用来替换java代码的技术,容器遇到标签后会将其转换成java代码,jsp标签类似于开始标记.属性.结束标记.标签体. EL表达式是一套简单的运算规则,用于给jsp标 ...

  7. Docker学习笔记之保存和共享镜像

    0x00 概述 让 Docker 引以为傲的是它能够实现相比于其他虚拟化软件更快的环境迁移和部署,在这件事情上,轻量级的容器和镜像结构的设计无疑发挥了巨大的作用.通过将容器打包成镜像,再利用体积远小于 ...

  8. tcp编程 示例

    #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <net ...

  9. gdb调试程序函数名为问号,什么原因?step by step解决方案

    gdb调试程序函数名为问号,什么原因? http://bbs.chinaunix.net/thread-1823649-1-1.html http://www.bubuko.com/infodetai ...

  10. python简说(十四)内置函数

    # sorted 排序# map  循环调用函数的,保存返回值# filter  循环调用函数,如果函数返回false,那么就过滤掉这个值,是指从你传入的这个list里面过虑. def abc(num ...