1。 第一个bug

运行

echo "Please tokenize this text." | java edu.stanford.nlp.process.PTBTokenizer 后显示。提示:

- -bash: java: command not found。

那我就觉得可能是java没安装。然后,我就去官网

下载的是放到了/data 目录下,然后解压,

解压完成后,vim ./bashrc,打开,然后输入如下的内容。

保存退出,source ~/.bashrc 一下。

这时候再 echo "Please tokenize this text." | java edu.stanford.nlp.process.PTBTokenizer就可以了。

2.  第二个bug  UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 858: ordinal not in range(128) 这个bug

  1. (jjenv_pytorch) root@032ba38f2b6e:/data/rl_abs_other/cnn-dailymail# ls
  2. README.md make_datafiles.py url_lists
  3. (jjenv_pytorch) root@032ba38f2b6e:/data/rl_abs_other/cnn-dailymail#
  4. (jjenv_pytorch) root@032ba38f2b6e:/data/rl_abs_other/cnn-dailymail# python make_datafiles.py /data/rl_abs_other/data/cnn/stories /data/rl_abs_other/data/dailymail/stories
  5. Preparing to tokenize /data/rl_abs_other/data/cnn/stories to cnn_stories_tokenized...
  6. Making list of files to tokenize...
  7. Tokenizing 92579 files in /data/rl_abs_other/data/cnn/stories and saving in cnn_stories_tokenized...
  8. Untokenizable: ? (U+20A9, decimal: 8361)
  9. Untokenizable: ? (U+202A, decimal: 8234)
  10. Untokenizable: ? (U+202A, decimal: 8234)
  11. Untokenizable: ? (U+20A9, decimal: 8361)
  12. Untokenizable: ? (U+202A, decimal: 8234)
  13. Untokenizable: ? (U+202F, decimal: 8239)
  14. Untokenizable: ? (U+202F, decimal: 8239)
  15. Untokenizable: ? (U+202F, decimal: 8239)
  16. Untokenizable: ? (U+20A9, decimal: 8361)
  17. Untokenizable: ? (U+202F, decimal: 8239)
  18. Untokenizable: ? (U+20A9, decimal: 8361)
  19. Untokenizable: ? (U+20A9, decimal: 8361)
  20. Untokenizable: ? (U+202A, decimal: 8234)
  21. Untokenizable: ? (U+202F, decimal: 8239)
  22. Untokenizable: ? (U+202C, decimal: 8236)
  23. Untokenizable: ? (U+202F, decimal: 8239)
  24. Untokenizable: ? (U+20A9, decimal: 8361)
  25. Untokenizable: ? (U+20A9, decimal: 8361)
  26. Untokenizable: ? (U+202A, decimal: 8234)
  27. Untokenizable: ? (U+202C, decimal: 8236)
  28. Untokenizable: ? (U+202A, decimal: 8234)
  29. Untokenizable: ? (U+20A9, decimal: 8361)
  30. Untokenizable: ? (U+20A9, decimal: 8361)
  31. Untokenizable: ? (U+202A, decimal: 8234)
  32. Untokenizable: ? (U+20A9, decimal: 8361)
  33. Untokenizable: ? (U+20A9, decimal: 8361)
  34. Untokenizable: ? (U+20A9, decimal: 8361)
  35. Untokenizable: ? (U+20A9, decimal: 8361)
  36. Untokenizable: ? (U+F06E, decimal: 61550)
  37. Untokenizable: ? (U+202F, decimal: 8239)
  38. Untokenizable: ? (U+202C, decimal: 8236)
  39. Untokenizable: ? (U+F022, decimal: 61474)
  40. Untokenizable: ? (U+202C, decimal: 8236)
  41. Untokenizable: ? (U+202A, decimal: 8234)
  42. Untokenizable: ? (U+202A, decimal: 8234)
  43. PTBTokenizer tokenized 80043350 tokens at 42671.94 tokens per second.
  44. Stanford CoreNLP Tokenizer has finished.
  45. Successfully finished tokenizing /data/rl_abs_other/data/cnn/stories to cnn_stories_tokenized.
  46.  
  47. Preparing to tokenize /data/rl_abs_other/data/dailymail/stories to dm_stories_tokenized...
  48. Making list of files to tokenize...
  49. Tokenizing 219506 files in /data/rl_abs_other/data/dailymail/stories and saving in dm_stories_tokenized...
  50. Untokenizable: ? (U+FFFC, decimal: 65532)
  51. Untokenizable: ? (U+2010, decimal: 8208)
  52. Untokenizable: ? (U+202A, decimal: 8234)
  53. Untokenizable: ? (U+FFFD, decimal: 65533)
  54. Untokenizable: ? (U+202A, decimal: 8234)
  55. Untokenizable: ? (U+FFFD, decimal: 65533)
  56. Untokenizable: ? (U+202D, decimal: 8237)
  57. Untokenizable: ? (U+202A, decimal: 8234)
  58. Untokenizable: ? (U+FFFD, decimal: 65533)
  59. Untokenizable: ? (U+202C, decimal: 8236)
  60. Untokenizable: ? (U+202C, decimal: 8236)
  61. Untokenizable: ? (U+202A, decimal: 8234)
  62. Untokenizable: ? (U+2012, decimal: 8210)
  63. Untokenizable: ? (U+202C, decimal: 8236)
  64. Untokenizable: ? (U+202A, decimal: 8234)
  65. Untokenizable: ? (U+202A, decimal: 8234)
  66. Untokenizable: ? (U+2010, decimal: 8208)
  67. Untokenizable: ? (U+202C, decimal: 8236)
  68. Untokenizable: ? (U+202A, decimal: 8234)
  69. Untokenizable: ? (U+202D, decimal: 8237)
  70. Untokenizable: ? (U+202A, decimal: 8234)
  71. Untokenizable: ? (U+FFFD, decimal: 65533)
  72. Untokenizable: ? (U+202C, decimal: 8236)
  73. Untokenizable: ? (U+FFFD, decimal: 65533)
  74. Untokenizable: ? (U+202A, decimal: 8234)
  75. Untokenizable: ? (U+202C, decimal: 8236)
  76. Untokenizable: ? (U+202A, decimal: 8234)
  77. Untokenizable: ? (U+202A, decimal: 8234)
  78. Untokenizable: ? (U+FFFD, decimal: 65533)
  79. Untokenizable: ? (U+2010, decimal: 8208)
  80. Untokenizable: ? (U+202A, decimal: 8234)
  81. Untokenizable: ? (U+202A, decimal: 8234)
  82. Untokenizable: ? (U+2010, decimal: 8208)
  83. Untokenizable: ? (U+202A, decimal: 8234)
  84. Untokenizable: ? (U+202A, decimal: 8234)
  85. Untokenizable: ? (U+202A, decimal: 8234)
  86. Untokenizable: ? (U+2010, decimal: 8208)
  87. Untokenizable: ? (U+202F, decimal: 8239)
  88. Untokenizable: ? (U+202A, decimal: 8234)
  89. Untokenizable: ? (U+202A, decimal: 8234)
  90. Untokenizable: ? (U+FFFD, decimal: 65533)
  91. Untokenizable: ? (U+202B, decimal: 8235)
  92. Untokenizable: ? (U+202C, decimal: 8236)
  93. Untokenizable: ? (U+202A, decimal: 8234)
  94. Untokenizable: ? (U+202C, decimal: 8236)
  95. Untokenizable: ? (U+202D, decimal: 8237)
  96. Untokenizable: ? (U+202C, decimal: 8236)
  97. Untokenizable: ? (U+202C, decimal: 8236)
  98. Untokenizable: ? (U+FFFD, decimal: 65533)
  99. Untokenizable: ? (U+202A, decimal: 8234)
  100. Untokenizable: ? (U+202A, decimal: 8234)
  101. Untokenizable: ? (U+202A, decimal: 8234)
  102. Untokenizable: ? (U+FFFD, decimal: 65533)
  103. Untokenizable: ? (U+202C, decimal: 8236)
  104. Untokenizable: ? (U+202A, decimal: 8234)
  105. Untokenizable: ? (U+FFFD, decimal: 65533)
  106. Untokenizable: ? (U+202A, decimal: 8234)
  107. Untokenizable: ? (U+FFFD, decimal: 65533)
  108. Untokenizable: ? (U+202A, decimal: 8234)
  109. Untokenizable: ? (U+2010, decimal: 8208)
  110. Untokenizable: ? (U+FFFD, decimal: 65533)
  111. Untokenizable: ? (U+FFFD, decimal: 65533)
  112. Untokenizable: ? (U+202A, decimal: 8234)
  113. Untokenizable: ? (U+F001, decimal: 61441)
  114. Untokenizable: ? (U+202C, decimal: 8236)
  115. Untokenizable: ? (U+202A, decimal: 8234)
  116. Untokenizable: ? (U+202A, decimal: 8234)
  117. Untokenizable: ? (U+202C, decimal: 8236)
  118. Untokenizable: ? (U+202A, decimal: 8234)
  119. Untokenizable: ? (U+202A, decimal: 8234)
  120. Untokenizable: ? (U+202A, decimal: 8234)
  121. Untokenizable: ? (U+202A, decimal: 8234)
  122. Untokenizable: ? (U+F001, decimal: 61441)
  123. Untokenizable: ? (U+FFFD, decimal: 65533)
  124. Untokenizable: ? (U+202A, decimal: 8234)
  125. Untokenizable: ? (U+202A, decimal: 8234)
  126. Untokenizable: ? (U+202A, decimal: 8234)
  127. Untokenizable: ? (U+70E, decimal: 1806)
  128. Untokenizable: ? (U+202A, decimal: 8234)
  129. Untokenizable: ? (U+202A, decimal: 8234)
  130. Untokenizable: ? (U+202A, decimal: 8234)
  131. Untokenizable: ? (U+FFFD, decimal: 65533)
  132. Untokenizable: ? (U+202A, decimal: 8234)
  133. Untokenizable: ? (U+FFFD, decimal: 65533)
  134. Untokenizable: ? (U+202F, decimal: 8239)
  135. Untokenizable: ? (U+2010, decimal: 8208)
  136. Untokenizable: ? (U+202A, decimal: 8234)
  137. Untokenizable: ? (U+202C, decimal: 8236)
  138. Untokenizable: ? (U+202A, decimal: 8234)
  139. Untokenizable: ? (U+202A, decimal: 8234)
  140. Untokenizable: ? (U+FFFD, decimal: 65533)
  141. Untokenizable: ? (U+202C, decimal: 8236)
  142. Untokenizable: ? (U+FFFD, decimal: 65533)
  143. Untokenizable: ? (U+202C, decimal: 8236)
  144. Untokenizable: ? (U+202A, decimal: 8234)
  145. Untokenizable: ? (U+202A, decimal: 8234)
  146. Untokenizable: ? (U+FFFD, decimal: 65533)
  147. Untokenizable: ? (U+206E, decimal: 8302)
  148. Untokenizable: ? (U+202A, decimal: 8234)
  149. Untokenizable: ? (U+202A, decimal: 8234)
  150. Untokenizable: ? (U+202A, decimal: 8234)
  151. Untokenizable: ? (U+202A, decimal: 8234)
  152. Untokenizable: ? (U+202A, decimal: 8234)
  153. Untokenizable: ? (U+202A, decimal: 8234)
  154. Untokenizable: ? (U+202A, decimal: 8234)
  155. Untokenizable: ? (U+FFFD, decimal: 65533)
  156. Untokenizable: ? (U+202A, decimal: 8234)
  157. Untokenizable: ? (U+200D, decimal: 8205)
  158. Untokenizable: ? (U+202A, decimal: 8234)
  159. Untokenizable: ? (U+FFFD, decimal: 65533)
  160. Untokenizable: ? (U+202C, decimal: 8236)
  161. PTBTokenizer tokenized 203118231 tokens at 32507.27 tokens per second.
  162. Stanford CoreNLP Tokenizer has finished.
  163. Successfully finished tokenizing /data/rl_abs_other/data/dailymail/stories to dm_stories_tokenized.
  164.  
  165. Making bin file for URLs listed in url_lists/all_test.txt...
  166. Writing story 0 of 11490; 0.00 percent done
  167. Traceback (most recent call last):
  168. File "make_datafiles.py", line 253, in <module>
  169. write_to_tar(all_test_urls, os.path.join(finished_files_dir, "test.tar"))
  170. File "make_datafiles.py", line 182, in write_to_tar
  171. article_sents, abstract_sents = get_art_abs(story_file)
  172. File "make_datafiles.py", line 106, in get_art_abs
  173. lines = read_story_file(story_file)
  174. File "make_datafiles.py", line 78, in read_story_file
  175. lines = f.read().split('\n\n')
  176. File "/root/anaconda3/envs/jjenv_pytorch/lib/python3.6/encodings/ascii.py", line 26, in decode
  177. return codecs.ascii_decode(input, self.errors)[0]
  178. UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 858: ordinal not in range(128)
  179. (jjenv_pytorch) root@032ba38f2b6e:/data/rl_abs_other/cnn-dailymail#

然后我以为是编码问题,就去 make_datafiles.py 的文件开头加上 # coding: utf-8 ,但是没有解决问题,后来参考了一篇帖子https://blog.csdn.net/qq_36847641/article/details/78414718

所以就把我自己的代码,做如下更改,就可以了。

但是,

然后我就继续运行make_datafiles.py文件,然后一路都顺利直到完成。

  1. (jjenv_pytorch) root@032ba38f2b6e:/data/rl_abs_other/cnn-dailymail# python make_datafiles.py /data/rl/rl_abs_other/data/dailymail/stories
  2. Making bin file for URLs listed in url_lists/all_test.txt...
  3. Writing story 0 of 11490; 0.00 percent done
  4. Writing story 1000 of 11490; 8.70 percent done
  5. Writing story 2000 of 11490; 17.41 percent done
  6. Writing story 3000 of 11490; 26.11 percent done
  7. Writing story 4000 of 11490; 34.81 percent done
  8. Writing story 5000 of 11490; 43.52 percent done
  9. Writing story 6000 of 11490; 52.22 percent done
  10. Writing story 7000 of 11490; 60.92 percent done
  11. Writing story 8000 of 11490; 69.63 percent done
  12. Writing story 9000 of 11490; 78.33 percent done
  13. Writing story 10000 of 11490; 87.03 percent done
  14. Writing story 11000 of 11490; 95.74 percent done
  15. Finished writing file finished_files/test.tar
  16.  
  17. Making bin file for URLs listed in url_lists/all_val.txt...
  18. Writing story 0 of 13368; 0.00 percent done
  19. Writing story 1000 of 13368; 7.48 percent done
  20. Writing story 2000 of 13368; 14.96 percent done
  21. Writing story 3000 of 13368; 22.44 percent done
  22. Writing story 4000 of 13368; 29.92 percent done
  23. Writing story 5000 of 13368; 37.40 percent done
  24. Writing story 6000 of 13368; 44.88 percent done
  25. Writing story 7000 of 13368; 52.36 percent done
  26. Writing story 8000 of 13368; 59.84 percent done
  27. Writing story 9000 of 13368; 67.32 percent done
  28. Writing story 10000 of 13368; 74.81 percent done
  29. Writing story 11000 of 13368; 82.29 percent done
  30. Writing story 12000 of 13368; 89.77 percent done
  31. Writing story 13000 of 13368; 97.25 percent done
  32. Finished writing file finished_files/val.tar
  33.  
  34. Making bin file for URLs listed in url_lists/all_train.txt...
  35. Writing story 0 of 287227; 0.00 percent done
  36. Writing story 1000 of 287227; 0.35 percent done
  37. Writing story 2000 of 287227; 0.70 percent done
  38. Writing story 3000 of 287227; 1.04 percent done
  39. Writing story 4000 of 287227; 1.39 percent done
  40. Writing story 5000 of 287227; 1.74 percent done
  41. Writing story 6000 of 287227; 2.09 percent done
  42. Writing story 7000 of 287227; 2.44 percent done
  43. Writing story 8000 of 287227; 2.79 percent done
  44. Writing story 9000 of 287227; 3.13 percent done
  45. Writing story 10000 of 287227; 3.48 percent done
  46. Writing story 11000 of 287227; 3.83 percent done
  47. Writing story 12000 of 287227; 4.18 percent done
  48. Writing story 13000 of 287227; 4.53 percent done
  49. Writing story 14000 of 287227; 4.87 percent done
  50. Writing story 15000 of 287227; 5.22 percent done
  51. Writing story 16000 of 287227; 5.57 percent done
  52. Writing story 17000 of 287227; 5.92 percent done
  53. Writing story 18000 of 287227; 6.27 percent done
  54. Writing story 19000 of 287227; 6.61 percent done
  55. Writing story 20000 of 287227; 6.96 percent done
  56. Writing story 21000 of 287227; 7.31 percent done
  57. Writing story 22000 of 287227; 7.66 percent done
  58. Writing story 23000 of 287227; 8.01 percent done
  59. Writing story 24000 of 287227; 8.36 percent done
  60. Writing story 25000 of 287227; 8.70 percent done
  61. Writing story 26000 of 287227; 9.05 percent done
  62. Writing story 27000 of 287227; 9.40 percent done
  63. Writing story 28000 of 287227; 9.75 percent done
  64. Writing story 29000 of 287227; 10.10 percent done
  65. Writing story 30000 of 287227; 10.44 percent done
  66. Writing story 31000 of 287227; 10.79 percent done
  67. Writing story 32000 of 287227; 11.14 percent done
  68. Writing story 33000 of 287227; 11.49 percent done
  69. Writing story 34000 of 287227; 11.84 percent done
  70. Writing story 35000 of 287227; 12.19 percent done
  71. Writing story 36000 of 287227; 12.53 percent done
  72. Writing story 37000 of 287227; 12.88 percent done
  73. Writing story 38000 of 287227; 13.23 percent done
  74. Writing story 39000 of 287227; 13.58 percent done
  75. Writing story 40000 of 287227; 13.93 percent done
  76. Writing story 41000 of 287227; 14.27 percent done
  77. Writing story 42000 of 287227; 14.62 percent done
  78. Writing story 43000 of 287227; 14.97 percent done
  79. Writing story 44000 of 287227; 15.32 percent done
  80. Writing story 45000 of 287227; 15.67 percent done
  81. Writing story 46000 of 287227; 16.02 percent done
  82. Writing story 47000 of 287227; 16.36 percent done
  83. Writing story 48000 of 287227; 16.71 percent done
  84. Writing story 49000 of 287227; 17.06 percent done
  85. Writing story 50000 of 287227; 17.41 percent done
  86. Writing story 51000 of 287227; 17.76 percent done
  87. Writing story 52000 of 287227; 18.10 percent done
  88. Writing story 53000 of 287227; 18.45 percent done
  89. Writing story 54000 of 287227; 18.80 percent done
  90. Writing story 55000 of 287227; 19.15 percent done
  91. Writing story 56000 of 287227; 19.50 percent done
  92. Writing story 57000 of 287227; 19.84 percent done
  93. Writing story 58000 of 287227; 20.19 percent done
  94. Writing story 59000 of 287227; 20.54 percent done
  95. Writing story 60000 of 287227; 20.89 percent done
  96. Writing story 61000 of 287227; 21.24 percent done
  97. Writing story 62000 of 287227; 21.59 percent done
  98. Writing story 63000 of 287227; 21.93 percent done
  99. Writing story 64000 of 287227; 22.28 percent done
  100. Writing story 65000 of 287227; 22.63 percent done
  101. Writing story 66000 of 287227; 22.98 percent done
  102. Writing story 67000 of 287227; 23.33 percent done
  103. Writing story 68000 of 287227; 23.67 percent done
  104. Writing story 69000 of 287227; 24.02 percent done
  105. Writing story 70000 of 287227; 24.37 percent done
  106. Writing story 71000 of 287227; 24.72 percent done
  107. Writing story 72000 of 287227; 25.07 percent done
  108. Writing story 73000 of 287227; 25.42 percent done
  109. Writing story 74000 of 287227; 25.76 percent done
  110. Writing story 75000 of 287227; 26.11 percent done
  111. Writing story 76000 of 287227; 26.46 percent done
  112. Writing story 77000 of 287227; 26.81 percent done
  113. Writing story 78000 of 287227; 27.16 percent done
  114. Writing story 79000 of 287227; 27.50 percent done
  115. Writing story 80000 of 287227; 27.85 percent done
  116. Writing story 81000 of 287227; 28.20 percent done
  117. Writing story 82000 of 287227; 28.55 percent done
  118. Writing story 83000 of 287227; 28.90 percent done
  119. Writing story 84000 of 287227; 29.25 percent done
  120. Writing story 85000 of 287227; 29.59 percent done
  121. Writing story 86000 of 287227; 29.94 percent done
  122. Writing story 87000 of 287227; 30.29 percent done
  123. Writing story 88000 of 287227; 30.64 percent done
  124. Writing story 89000 of 287227; 30.99 percent done
  125. Writing story 90000 of 287227; 31.33 percent done
  126. Writing story 91000 of 287227; 31.68 percent done
  127. Writing story 92000 of 287227; 32.03 percent done
  128. Writing story 93000 of 287227; 32.38 percent done
  129. Writing story 94000 of 287227; 32.73 percent done
  130. Writing story 95000 of 287227; 33.07 percent done
  131. Writing story 96000 of 287227; 33.42 percent done
  132. Writing story 97000 of 287227; 33.77 percent done
  133. Writing story 98000 of 287227; 34.12 percent done
  134. Writing story 99000 of 287227; 34.47 percent done
  135. Writing story 100000 of 287227; 34.82 percent done
  136. Writing story 101000 of 287227; 35.16 percent done
  137. Writing story 102000 of 287227; 35.51 percent done
  138. Writing story 103000 of 287227; 35.86 percent done
  139. Writing story 104000 of 287227; 36.21 percent done
  140. Writing story 105000 of 287227; 36.56 percent done
  141. Writing story 106000 of 287227; 36.90 percent done
  142. Writing story 107000 of 287227; 37.25 percent done
  143. Writing story 108000 of 287227; 37.60 percent done
  144. Writing story 109000 of 287227; 37.95 percent done
  145. Writing story 110000 of 287227; 38.30 percent done
  146. Writing story 111000 of 287227; 38.65 percent done
  147. Writing story 112000 of 287227; 38.99 percent done
  148. Writing story 113000 of 287227; 39.34 percent done
  149. Writing story 114000 of 287227; 39.69 percent done
  150. Writing story 115000 of 287227; 40.04 percent done
  151. Writing story 116000 of 287227; 40.39 percent done
  152. Writing story 117000 of 287227; 40.73 percent done
  153. Writing story 118000 of 287227; 41.08 percent done
  154. Writing story 119000 of 287227; 41.43 percent done
  155. Writing story 120000 of 287227; 41.78 percent done
  156. Writing story 121000 of 287227; 42.13 percent done
  157. Writing story 122000 of 287227; 42.48 percent done
  158. Writing story 123000 of 287227; 42.82 percent done
  159. Writing story 124000 of 287227; 43.17 percent done
  160. Writing story 125000 of 287227; 43.52 percent done
  161. Writing story 126000 of 287227; 43.87 percent done
  162. Writing story 127000 of 287227; 44.22 percent done
  163. Writing story 128000 of 287227; 44.56 percent done
  164. Writing story 129000 of 287227; 44.91 percent done
  165. Writing story 130000 of 287227; 45.26 percent done
  166. Writing story 131000 of 287227; 45.61 percent done
  167. Writing story 132000 of 287227; 45.96 percent done
  168. Writing story 133000 of 287227; 46.30 percent done
  169. Writing story 134000 of 287227; 46.65 percent done
  170. Writing story 135000 of 287227; 47.00 percent done
  171. Writing story 136000 of 287227; 47.35 percent done
  172. Writing story 137000 of 287227; 47.70 percent done
  173. Writing story 138000 of 287227; 48.05 percent done
  174. Writing story 139000 of 287227; 48.39 percent done
  175. Writing story 140000 of 287227; 48.74 percent done
  176. Writing story 141000 of 287227; 49.09 percent done
  177. Writing story 142000 of 287227; 49.44 percent done
  178. Writing story 143000 of 287227; 49.79 percent done
  179. Writing story 144000 of 287227; 50.13 percent done
  180. Writing story 145000 of 287227; 50.48 percent done
  181. Writing story 146000 of 287227; 50.83 percent done
  182. Writing story 147000 of 287227; 51.18 percent done
  183. Writing story 148000 of 287227; 51.53 percent done
  184. Writing story 149000 of 287227; 51.88 percent done
  185. Writing story 150000 of 287227; 52.22 percent done
  186. Writing story 151000 of 287227; 52.57 percent done
  187. Writing story 152000 of 287227; 52.92 percent done
  188. Writing story 153000 of 287227; 53.27 percent done
  189. Writing story 154000 of 287227; 53.62 percent done
  190. Writing story 155000 of 287227; 53.96 percent done
  191. Writing story 156000 of 287227; 54.31 percent done
  192. Writing story 157000 of 287227; 54.66 percent done
  193. Writing story 158000 of 287227; 55.01 percent done
  194. Writing story 159000 of 287227; 55.36 percent done
  195. Writing story 160000 of 287227; 55.71 percent done
  196. Writing story 161000 of 287227; 56.05 percent done
  197. Writing story 162000 of 287227; 56.40 percent done
  198. Writing story 163000 of 287227; 56.75 percent done
  199. Writing story 164000 of 287227; 57.10 percent done
  200. Writing story 165000 of 287227; 57.45 percent done
  201. Writing story 166000 of 287227; 57.79 percent done
  202. Writing story 167000 of 287227; 58.14 percent done
  203. Writing story 168000 of 287227; 58.49 percent done
  204. Writing story 169000 of 287227; 58.84 percent done
  205. Writing story 170000 of 287227; 59.19 percent done
  206. Writing story 171000 of 287227; 59.53 percent done
  207. Writing story 172000 of 287227; 59.88 percent done
  208. Writing story 173000 of 287227; 60.23 percent done
  209. Writing story 174000 of 287227; 60.58 percent done
  210. Writing story 175000 of 287227; 60.93 percent done
  211. Writing story 176000 of 287227; 61.28 percent done
  212. Writing story 177000 of 287227; 61.62 percent done
  213. Writing story 178000 of 287227; 61.97 percent done
  214. Writing story 179000 of 287227; 62.32 percent done
  215. Writing story 180000 of 287227; 62.67 percent done
  216. Writing story 181000 of 287227; 63.02 percent done
  217. Writing story 182000 of 287227; 63.36 percent done
  218. Writing story 183000 of 287227; 63.71 percent done
  219. Writing story 184000 of 287227; 64.06 percent done
  220. Writing story 185000 of 287227; 64.41 percent done
  221. Writing story 186000 of 287227; 64.76 percent done
  222. Writing story 187000 of 287227; 65.11 percent done
  223. Writing story 188000 of 287227; 65.45 percent done
  224. Writing story 189000 of 287227; 65.80 percent done
  225. Writing story 190000 of 287227; 66.15 percent done
  226. Writing story 191000 of 287227; 66.50 percent done
  227. Writing story 192000 of 287227; 66.85 percent done
  228. Writing story 193000 of 287227; 67.19 percent done
  229. Writing story 194000 of 287227; 67.54 percent done
  230. Writing story 195000 of 287227; 67.89 percent done
  231. Writing story 196000 of 287227; 68.24 percent done
  232. Writing story 197000 of 287227; 68.59 percent done
  233. Writing story 198000 of 287227; 68.94 percent done
  234. Writing story 199000 of 287227; 69.28 percent done
  235. Writing story 200000 of 287227; 69.63 percent done
  236. Writing story 201000 of 287227; 69.98 percent done
  237. Writing story 202000 of 287227; 70.33 percent done
  238. Writing story 203000 of 287227; 70.68 percent done
  239. Writing story 204000 of 287227; 71.02 percent done
  240. Writing story 205000 of 287227; 71.37 percent done
  241. Writing story 206000 of 287227; 71.72 percent done
  242. Writing story 207000 of 287227; 72.07 percent done
  243. Writing story 208000 of 287227; 72.42 percent done
  244. Writing story 209000 of 287227; 72.76 percent done
  245. Writing story 210000 of 287227; 73.11 percent done
  246. Writing story 211000 of 287227; 73.46 percent done
  247. Writing story 212000 of 287227; 73.81 percent done
  248. Writing story 213000 of 287227; 74.16 percent done
  249. Writing story 214000 of 287227; 74.51 percent done
  250. Writing story 215000 of 287227; 74.85 percent done
  251. Writing story 216000 of 287227; 75.20 percent done
  252. Writing story 217000 of 287227; 75.55 percent done
  253. Writing story 218000 of 287227; 75.90 percent done
  254. Writing story 219000 of 287227; 76.25 percent done
  255. Writing story 220000 of 287227; 76.59 percent done
  256. Writing story 221000 of 287227; 76.94 percent done
  257. Writing story 222000 of 287227; 77.29 percent done
  258. Writing story 223000 of 287227; 77.64 percent done
  259. Writing story 224000 of 287227; 77.99 percent done
  260. Writing story 225000 of 287227; 78.34 percent done
  261. Writing story 226000 of 287227; 78.68 percent done
  262. Writing story 227000 of 287227; 79.03 percent done
  263. Writing story 228000 of 287227; 79.38 percent done
  264. Writing story 229000 of 287227; 79.73 percent done
  265. Writing story 230000 of 287227; 80.08 percent done
  266. Writing story 231000 of 287227; 80.42 percent done
  267. Writing story 232000 of 287227; 80.77 percent done
  268. Writing story 233000 of 287227; 81.12 percent done
  269. Writing story 234000 of 287227; 81.47 percent done
  270. Writing story 235000 of 287227; 81.82 percent done
  271. Writing story 236000 of 287227; 82.16 percent done
  272. Writing story 237000 of 287227; 82.51 percent done
  273. Writing story 238000 of 287227; 82.86 percent done
  274. Writing story 239000 of 287227; 83.21 percent done
  275. Writing story 240000 of 287227; 83.56 percent done
  276. Writing story 241000 of 287227; 83.91 percent done
  277. Writing story 242000 of 287227; 84.25 percent done
  278. Writing story 243000 of 287227; 84.60 percent done
  279. Writing story 244000 of 287227; 84.95 percent done
  280. Writing story 245000 of 287227; 85.30 percent done
  281. Writing story 246000 of 287227; 85.65 percent done
  282. Writing story 247000 of 287227; 85.99 percent done
  283. Writing story 248000 of 287227; 86.34 percent done
  284. Writing story 249000 of 287227; 86.69 percent done
  285. Writing story 250000 of 287227; 87.04 percent done
  286. Writing story 251000 of 287227; 87.39 percent done
  287. Writing story 252000 of 287227; 87.74 percent done
  288. Writing story 253000 of 287227; 88.08 percent done
  289. Writing story 254000 of 287227; 88.43 percent done
  290. Writing story 255000 of 287227; 88.78 percent done
  291. Writing story 256000 of 287227; 89.13 percent done
  292. Writing story 257000 of 287227; 89.48 percent done
  293. Writing story 258000 of 287227; 89.82 percent done
  294. Writing story 259000 of 287227; 90.17 percent done
  295. Writing story 260000 of 287227; 90.52 percent done
  296. Writing story 261000 of 287227; 90.87 percent done
  297. Writing story 262000 of 287227; 91.22 percent done
  298. Writing story 263000 of 287227; 91.57 percent done
  299. Writing story 264000 of 287227; 91.91 percent done
  300. Writing story 265000 of 287227; 92.26 percent done
  301. Writing story 266000 of 287227; 92.61 percent done
  302. Writing story 267000 of 287227; 92.96 percent done
  303. Writing story 268000 of 287227; 93.31 percent done
  304. Writing story 269000 of 287227; 93.65 percent done
  305. Writing story 270000 of 287227; 94.00 percent done
  306. Writing story 271000 of 287227; 94.35 percent done
  307. Writing story 272000 of 287227; 94.70 percent done
  308. Writing story 273000 of 287227; 95.05 percent done
  309. Writing story 274000 of 287227; 95.39 percent done
  310. Writing story 275000 of 287227; 95.74 percent done
  311. Writing story 276000 of 287227; 96.09 percent done
  312. Writing story 277000 of 287227; 96.44 percent done
  313. Writing story 278000 of 287227; 96.79 percent done
  314. Writing story 279000 of 287227; 97.14 percent done
  315. Writing story 280000 of 287227; 97.48 percent done
  316. Writing story 281000 of 287227; 97.83 percent done
  317. Writing story 282000 of 287227; 98.18 percent done
  318. Writing story 283000 of 287227; 98.53 percent done
  319. Writing story 284000 of 287227; 98.88 percent done
  320. Writing story 285000 of 287227; 99.22 percent done
  321. Writing story 286000 of 287227; 99.57 percent done
  322. Writing story 287000 of 287227; 99.92 percent done
  323. Finished writing file finished_files/train.tar
  324.  
  325. Writing vocab file...
  326. Finished writing vocab file

运行make_datafiles的过程的更多相关文章

  1. YARN(MapReduce 2)运行MapReduce的过程-源码分析

    这是我的分析,当然查阅书籍和网络.如有什么不对的,请各位批评指正.以下的类有的并不完全,只列出重要的方法. 如要转载,请注上作者以及出处. 一.源码阅读环境 需要安装jdk1.7.0版本及其以上版本, ...

  2. Windows7下的Java运行环境搭建过程图解

    第一步:下载JDK 地址:http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html,(由于Sun于20 ...

  3. Qt入门之基础篇 ( 二 ) :Qt项目建立、编译、运行和发布过程解析

    转载请注明出处:CN_Simo. 题解: 本篇内容主讲Qt应用从创建到发布的整个过程,旨在帮助读者能够快速走进Qt的世界. 本来计划是讲解Qt源码静态编译,如此的话读者可能并不能清楚地知道为何要静态编 ...

  4. JVM运行和类加载过程

    JAVA的JVM的内存可分为3个区:堆(heap).栈(stack)和方法区(method) (该知识点引用 http://www.cnblogs.com/dingyingsi/p/3760730.h ...

  5. MapReduce运行原理和过程

    原文 一.Map的原理和运行流程 Map的输入数据源是多种多样的,我们使用hdfs作为数据源.文件在hdfs上是以block(块,Hdfs上的存储单元)为单位进行存储的. 1.分片 我们将这一个个bl ...

  6. docker-machine create -d generic 运行的波折过程及遇见的问题

    这是一个愚蠢的学习过程,但是因为觉得过程还是值得记录的,还是写了下来 2>driver = generic 1)在这个过程中使用的都是本地的mac系统,然后尝试在mac本地create -d g ...

  7. 【原创】MapReduce运行原理和过程

    一.Map的原理和运行流程 Map的输入数据源是多种多样的,我们使用hdfs作为数据源.文件在hdfs上是以block(块,Hdfs上的存储单元)为单位进行存储的. 1.分片 我们将这一个个block ...

  8. C++源文件到可运行文件的过程

    一.四个步骤    对于C/C++编写的程序,从源码到可运行文件,一般经过以下四个步骤: 1).预处理,产生.ii文件 2).编译,产生汇编文件(.s文件) 3).汇编,产生目标文件(.o或.obj文 ...

  9. 记录从裸机到TensorFlow GPU版运行 的配置过程

    实验室原来有一台装Ubuntu Server系统的服务器,安装有tensorflow,在使用过程中经常出现断网.死机.自动关机等毛病,忍无可忍,决定重装系统 配置如下:Dell工作站,Xeon-E5 ...

随机推荐

  1. 金融量化分析【day111】:Pandas-分组与聚合

    一.分组与聚合 在数据分析中,我们有时需要将数据拆分,在每一个特定的组里进行运算 1.实验数据准备 a = pd.read_csv('601318.csv') a 数据如下: 实验数据 2.示例 df ...

  2. 苹果手机iOS11中fixed弹出框中input光标错位问题

    最近遇到了一个移动前端的BUG:手机弹出框中的输入框focus时光标可能会错位. 刚开始时我完全不知道错误原因是什么,在电脑上调试时完全没有问题,手机上出现问题时也没有找到规律.后来在网上搜索了大量的 ...

  3. DirectX11 With Windows SDK--01 DirectX11初始化

    前言 由于个人觉得龙书里面第4章提供的Direct3D 初始化项目封装得比较好,而且DirectX SDK Samples里面的初始化程序过于精简,不适合后续使用,故选择了以Init Direct3D ...

  4. js实现可输入的下拉框

    <HTML> <HEAD> <META http-equiv='Content-Type' content='text/html; charset=gb2312'> ...

  5. vue通过extend动态创建全局组件(插件)学习小记

    测试环境:nodejs+webpack,例子是看文章的,注释为自己的理解 创建一个toast.vue文件: <template> <div class="wrap" ...

  6. MapReduce Partition解析

    Map的结果,会通过partition分发到Reducer上,reducer操作过后会进行输出.输出的文件格式后缀000001就代表1分区. Mapper处理过后的键值对,是需要送到Reducer那边 ...

  7. Saltstack自动化操作记录(2)-配置使用

    之前梳理了Saltstack自动化操作记录(1)-环境部署,下面说说saltstack配置及模块使用: 为了试验效果,再追加一台被控制端minion机器192.168.1.118需要在master控制 ...

  8. 将代码上传版本库gitee

    首先在电脑中安装git,配置好环境变量. 在后台输入命令上传 上传账号的用户名git config --global user.name "" 上传账号的邮箱git config ...

  9. 【原创】Java基础之Session机制

    Session机制 JSESSIONID是Session的标识,当客户端请求服务器端的时候,服务器端会检查是否已经给这个客户端创建过Session,也就是看客户端的请求中的header是否有Cooki ...

  10. 【原创】大叔算法分享(4)Cardinality Estimate 基数计数概率算法

    读过<编程珠玑>(<Programming Pearls>)的人应该还对开篇的Case记忆犹新,大概的场景是: 作者的一位在电话公司工作的朋友想要统计一段时间内不同的电话号码的 ...