TensorFlow入门——bazel编译（带GPU）

这一系列基本上是属于我自己进行到了那个步骤就做到那个步骤的

由于新装了GPU (GTX750ti)和CUDA9.0、CUDNN7.1版本的软件，所以希望TensorFlow能在GPU上运行，也算上补上之前的承诺

说了下初衷，由于现在新的CUDA版本对TensorFlow的支持不好，只能采取编译源码的方式进行

所以大概分为以下几个步骤

1.安装依赖库（这部分我已经做过了，不进行介绍，可以看前边的依赖库，基本一致）

sudo apt-get install openjdk-8-jdk

jdk是bazel必须的

2.安装Git（有的就跳过这一步）

3.安装TensorFlow的build工具bazel

4.配置并编译TensorFlow源码

5.安装并配置环境变量

1.安装依赖库

2.安装Git

使用

sudo apt-get install git
git clone --recursive https://github.com/tensorflow/tensorflow

3. 安装TensorFlow的build工具bazel

这一步比较麻烦，是因为apt-get中没有bazel这个工具

因此需要到GitHub上先下载，再进行安装下载地址是https://github.com/bazelbuild/bazel/releases

选择正确版本下载，这里序号看下TensorFlow的版本需求，具体对BAZEL的需求可以查看configure.py文件，比如我这个版本中就有这样的一段

_TF_BAZELRC_FILENAME = '.tf_configure.bazelrc'

_TF_WORKSPACE_ROOT = ''

_TF_BAZELRC = ''

_TF_CURRENT_BAZEL_VERSION = None

_TF_MIN_BAZEL_VERSION = '0.27.1'

_TF_MAX_BAZEL_VERSION = '1.1.0'

每个字段的意思从字面上就可以得知，_TF_BAZELRC_FILENAME是使用bazel编译时使用的配置文件（没有特别细致的研究，https://www.cnblogs.com/shouhuxianjian/p/9416934.html里边有解释），_TF_MIN_BAZEL_VERSION = '0.27.1'是最低的bazel版本需求

使用sudo命令安装.sh文件即可

sudo chmod +x ./bazel*.sh

sudo ./bazel-0.*.sh

4.配置并编译TensorFlow源码

首先是配置，可以针对自己的需求进行选择和裁剪。这一步特别麻烦，有很多选项需要选择，我的选择如下：

 jourluohua@jour:~/tools/tensorflow$ ./configure

 WARNING: Running Bazel server needs to be killed, because the startup options are different.

 You have bazel 0.14. installed.

 Please specify the location of python. [Default is /usr/bin/python]: 

 Found possible Python library paths:

   /usr/local/lib/python2./dist-packages

   /usr/lib/python2./dist-packages

 Please input the desired Python library path to use.  Default is [/usr/local/lib/python2./dist-packages]

 Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: Y

 jemalloc as malloc support will be enabled for TensorFlow.

 Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n

 No Google Cloud Platform support will be enabled for TensorFlow.

 Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n

 No Hadoop File System support will be enabled for TensorFlow.

 Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n

 No Amazon S3 File System support will be enabled for TensorFlow.

 Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n

 No Apache Kafka Platform support will be enabled for TensorFlow.

 Do you wish to build TensorFlow with XLA JIT support? [y/N]: y

 XLA JIT support will be enabled for TensorFlow.

 Do you wish to build TensorFlow with GDR support? [y/N]: y

 GDR support will be enabled for TensorFlow.

 Do you wish to build TensorFlow with VERBS support? [y/N]: y

 VERBS support will be enabled for TensorFlow.

 Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N

 No OpenCL SYCL support will be enabled for TensorFlow.

 Do you wish to build TensorFlow with CUDA support? [y/N]: y

 CUDA support will be enabled for TensorFlow.

 Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 

 Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 

 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 

 Please specify the location where cuDNN  library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

 Do you wish to build TensorFlow with TensorRT support? [y/N]: N

 No TensorRT support will be enabled for TensorFlow.

 Please specify the NCCL version you want to use. [Leave empty to default to NCCL 1.3]: 

 Please specify a list of comma-separated Cuda compute capabilities you want to build with.

 You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.

 Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 5.0]

 Do you want to use clang as CUDA compiler? [y/N]: N

 nvcc will be used as CUDA compiler.

 Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 

 Do you wish to build TensorFlow with MPI support? [y/N]: N

 No MPI support will be enabled for TensorFlow.

 Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 

 Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N

 Not configuring the WORKSPACE for Android builds.

 Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.

     --config=mkl             # Build with MKL support.

     --config=monolithic      # Config for mostly static monolithic build.

 Configuration finished

然后使用bazel进行编译(本步骤非常容易出问题，而且特别耗时)，这里使用 -c opt是编译release版本的，使用-c dbg是编译debug版本的

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

中间会遇到很多问题，这里列举一些不方便查的错误。

1）比如会遇到CXX的错误，然后具体的错误还很难排查（只显示哪个配置文件的哪一行出错，并不显示具体错误）。需要查看具体错误信息的时候，建议添加--verbose_failures选项。

2）遇到CXX的错误，（做编译的都知道，比较成熟C++的代码稳定性比较好，兼容性也比较好，移植起来也比较方便，一般不会遇到编译器和环境问题）可能是编译器gcc的版本问题，可以添加--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"

3）遇到virtual memory exhausted: Cannot allocate memory 错误。这是因为swap分区没有设置或者swap分区容量设置太小的问题，使用free -m命令可以得知这个错误，可以使用扩展swap分区容量的方法。大概的命令如下

mkdir /home/jourluohua/swap

rm -rf /home/jourluohua/swap

dd if=/dev/zero of=/home/jourluohua/swap bs= count=4096000
mkswap /home/jourluohua/swap
sudo swapon /home/jourluohua/swap

意思是设置4096000个1024byte大小的块，一共是4G。如果问题还是没有解决，以为bazel默认是使用多线程编译模式，可以手动添加 -j 2选项，将使用的线程固定在2

4）遇到AttributeError: 'module' object has no attribute 'IntEnum' 这个问题比较模糊，使用python -c "import enum"的时候没有错误，但是里边确实没有IntEnum的属性，查找后发现是需要安装enum34包来解决，Python不太好的一点就是各种包非常混乱，

pip install enum34 --user

5）遇到AttributeError: attribute '__doc__' of 'type' objects is not writable错误。这个问题其实挺棘手的，自身是体系结构方向，一般使用的语言也是C++，对Python不是很熟悉，也许是我的编译环境出了问题？检查查了下__doc__是Python里边注释。

先写了个小程序复现了这个问题：

#!/usr/bin/python

from functools import wraps

#from https://stackoverflow.com/questions/39010366/functools-wrapper-attributeerror-attribute-doc-of-type-objects-is-not

def memoize(f):

    """ Memoization decorator for functions taking one or more arguments.

        Saves repeated api calls for a given value, by caching it.

    """

    @wraps(f)

    class memodict(dict):

       """memodict"""

       def __init__(self, f):

           self.f = f

       def __call__(self, *args):

           return self[args]

       def __missing__(self, key):

           ret = self[key] = self.f(*key)

           return ret

    return memodict(f)

@memoize

def a():

    """blah"""

    pass

出现了同样的错误：

Traceback (most recent call last):

  File "ipy.py", line , in <module>

    @memoize

  File "ipy.py", line , in memoize

    class memodict(dict):

  File "/usr/lib/python2.7/functools.py", line , in update_wrapper

    setattr(wrapper, attr, getattr(wrapped, attr))

AttributeError: attribute '__doc__' of 'type' objects is not writable

打开出问题的Python代码，原来的代码是这样

@tf_export(v1=["VariableAggregation"])

class VariableAggregation(enum.Enum):

  NONE = 0

  SUM = 1

  MEAN = 2

  ONLY_FIRST_REPLICA = 3

  ONLY_FIRST_TOWER = 3  # DEPRECATED

  def __hash__(self):

    return hash(self.value)

# LINT.ThenChange(//tensorflow/core/framework/variable.proto)

#

# Note that we are currently relying on the integer values of the Python enums

# matching the integer values of the proto enums.

VariableAggregation.__doc__ = (

    VariableAggregationV2.__doc__ +

    "* `ONLY_FIRST_TOWER`: Deprecated alias for `ONLY_FIRST_REPLICA`.\n  ")

大概就是要将VariableAggregation的注释设置成VariableAggregationV2加上额外的一段"* `ONLY_FIRST_TOWER`: Deprecated alias for `ONLY_FIRST_REPLICA`.\n "，猜想既然不允许在class声明外做这个事情，那么直接在class中设置是否可行？

修改后的代码如下：

@tf_export(v1=["VariableAggregation"])

class VariableAggregation(enum.Enum):

  NONE = 0

  SUM = 1

  MEAN = 2

  ONLY_FIRST_REPLICA = 3

  ONLY_FIRST_TOWER = 3  # DEPRECATED

  __doc__ = (VariableAggregationV2.__doc__ + "* `ONLY_FIRST_TOWER`: Deprecated alias for `ONLY_FIRST_REPLICA`.\n  ")

  def __hash__(self):

    return hash(self.value)

# LINT.ThenChange(//tensorflow/core/framework/variable.proto)

#

# Note that we are currently relying on the integer values of the Python enums

# matching the integer values of the proto enums.

#VariableAggregation.__doc__ = (

 #   VariableAggregationV2.__doc__ +

  #  "* `ONLY_FIRST_TOWER`: Deprecated alias for `ONLY_FIRST_REPLICA`.\n  ")

6）遇到LargeZipFile: Zipfile size would require ZIP64 extensions 问题，这个问题其实很明显，就是文件太大了，在需要压缩的时候，需要配置一下ZIP64选项，而默认应该是不支持的，修改/usr/lib/python2.7/dist-packages/wheel/archive.py文件

将 zip = zipfile.ZipFile(open(zip_filename, "wb+"), "w",compression=zipfile.ZIP_DEFLATED）改成zip = zipfile.ZipFile(open(zip_filename, "wb+"), "w",compression=zipfile.ZIP_DEFLATED, allowZip64=True)就可以。

但是说实话，debug版本还是太大了，超过了zip可以压缩的大小，主要是CRC32校验那里过不去，对于我不是急需，就没有修改这里，毕竟Python2.7已经不再更新，没有努力的必要，Python3.5以上的版本这里都没有问题。

还有一些其他缺库的问题，一般都比较好搜索，就不一一列举在这里。

5.安装并配置环境变量

使用pip进行安装

$ pip install /tmp/tensorflow_pkg/tensorflow --user

# with no spaces after tensorflow hit tab before hitting enter to fill in blanks

最后就是测试

import tensorflow as tf

sess = tf.InteractiveSession()

sess.close()

如果每一步都不报错的，TensorFlow就编译并安装成功了

TensorFlow入门——bazel编译（带GPU）的更多相关文章

记录一次Python下Tensorflow安装过程，1.7带GPU加速版本
最近由于论文需要,急需搭建Tensorflow环境,16年底当时Tensorflow版本号还没有过1,我曾按照手册搭建过CPU版本.目前,1.7算是比较新的版本了(也可以从源码编译1.8版本的Tens ...
Bazel 编译工具; tensorflow 编译
什么是bazel https://docs.bazel.build/versions/master/bazel-overview.html 使用 bazel 构建 c++ 工程 https://git ...
开源框架---通过Bazel编译使用tensorflow c++ API 记录
开源框架---通过Bazel编译使用tensorflow c++ API 记录 tensorflow python API,在python中借用pip安装tensorflow,真的很方便,几句指令就完 ...
（转）TensorFlow 入门
TensorFlow 入门本文转自:http://www.jianshu.com/p/6766fbcd43b9 字数3303 阅读904 评论3 喜欢5 CS224d-Day 2: 在 Da ...
毫秒级检测！你见过带GPU的树莓派吗？
树莓派3B+英特尔神经计算棒进行高速目标检测转载请注明作者梦里茶代码: 训练数据预处理: https://gist.github.com/ahangchen/ae1b7562c1f93fdad1d ...
windows 10 64bit下安装Tensorflow+Keras+VS2015+CUDA8.0 GPU加速
原文地址:http://www.jianshu.com/p/c245d46d43f0 写在前面的话 2016年11月29日,Google Brain 工程师团队宣布在 TensorFlow 0.12 ...
tensor搭建--windows 10 64bit下安装Tensorflow+Keras+VS2015+CUDA8.0 GPU加速
windows 10 64bit下安装Tensorflow+Keras+VS2015+CUDA8.0 GPU加速原文见于:http://www.jianshu.com/p/c245d46d43f0 ...
编译TensorFlow-serving GPU版本
编译TensorFlow-serving GPU版本 TensorFlow Serving 介绍编译GPU版本下载源码 git clone https://github.com/tensorflo ...
TensorFlow 入门 | iBooker·ApacheCN
原文:Getting Started with TensorFlow 协议:CC BY-NC-SA 4.0 自豪地采用谷歌翻译不要担心自己的形象,只关心如何实现目标.--<原则>,生活原 ...

随机推荐

html5-meta标签和搜索引擎
emta标签的组成: meta标签分两大部分:HTTP-EQUIV 和 NAME 变量. HTTP-EQUIV:HTTP-EQUIV类似于HTTP的头部协议,它回应给浏览器一些有用的信息,以帮助 ...
mintUI 移动UI框架入门
入门地址: http://mint-ui.github.io/#!/zh-cn 下载依赖cd到项目目录下, 下载我们用的UI框架: 分为全局引入和按需引入全局引入: npm install mint ...
Python3 matplotlib.pyplot 中文乱码多个直线图添加图例
#import之后 font = { 'family' : 'SimHei' } matplotlib.rc('font', **font) # -*- coding:utf-8 -*- import ...
003-Spring4 扩展分析-spring类初始化@PostConstruct > InitializingBean > init-method、ApplicationContext、BeanPostProcessor、BeanFactoryPostProcessor、BeanDefinitionRegistryPostProcessor
一.spring类初始化@PostConstruct > InitializingBean > init-method InitializingBean接口为bean提供了初始化方法的方式 ...
python的XML解析
http://www.jb51.net/article/63780.htm http://www.runoob.com/python/python-xml.html http://kb.cnblogs ...
转: 动态加载、移除js、css文件
function loadjscssfile(filename, filetype){ if (filetype=="js"){ var fileref=document.crea ...
阶段3 3.SpringMVC·_02.参数绑定及自定义类型转换_4 请求参数绑定集合类型
jabaBean里面有集合的情况把account里面的user对象先注释掉.get和set都注释掉.然后toString方法需要重写 List和Map这两种对象.生成get和set方法 toStri ...
python 中 __dict__函数的使用
class F: def __init__(self, name, age): self.name = name self.age = age obj = F('tom', 20)s = obj.__ ...
iOS源码学习总结框架
1.ARChromeActivity: 用于在Google Chrome中打开网址的UIActivity子类. 2.KINWebBrowser: 它使用iOS 8的 WKWebView API编写,同 ...
java数据结构之ArrayList
一.ArrayList源码注释 /** * ArrayList源码分析,jdk版本为1.8.0_121 */ public class ArrayList<E> extends Abstr ...

TensorFlow入门——bazel编译（带GPU）

TensorFlow入门——bazel编译（带GPU）的更多相关文章

随机推荐

热门专题