1.在主机上插上GPU之后,查看设备:

$ nvidia-smi
Tue Dec ::
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| GeForce GTX Off | ::00.0 On | N/A |
| % 34C P8 8W / 200W | 284MiB / 8112MiB | % Default |
+-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| G /usr/lib/xorg/Xorg 117MiB |
| G compiz 155MiB |
| G fcitx-qimpanel 9MiB |
+-----------------------------------------------------------------------------+

可见系统已经检测到GeForce GTX 1080.

另外,这台机器之前搭载过1060,从上面的结果还可以看到对应的驱动NVIDIA 375.66还在;而使用GTX1080对应要装驱动NVIDIA 367.27

$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt-get update

中间过程遇到Y/n时候直接回车继续

然后装驱动nvidia-367

$ sudo apt-get install nvidia-

在这一步,因为与之前的驱动nvidia375存在冲突,会报错:

Building initial module for 4.10.0-32-generic
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-384.0.crash'
Error! Bad return status for module build on kernel: 4.10.0-32-generic (x86_64)
Consult /var/lib/dkms/nvidia-384/384.98/build/make.log for more information.
dpkg: error processing package nvidia-384 (--configure):
subprocess installed post-installation script returned error exit status 10
dpkg: dependency problems prevent configuration of libcuda1-384:
libcuda1-384 depends on nvidia-384 (>= 384.98); however:
Package nvidia-384 is not configured yet. dpkg: error processing package libcuda1-384 (--configure):
dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of nvidia-367:
nvidia-367 depends on nvidia-384; however:
Package nvidia-384 is not configured yet. dpkg: error processing package nvidia-367 (--configure):
dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of nvidia-opencl-icd-384:
nvidia-opencl-icd-384 depends on nvidia-384 (>= 384.98); however:
Package nvidia-384 is not configured yet. dpkg: error processing package nvidia-opencl-icd-384 (--configure):
dependency problems - leaving unconfigured
Setting up nvidia-prime (0.8.2) ...
No apport report written because the error message indicates its a followup error from a previous failure.
No apport report written because the error message indicates its a followup error from a previous failure.
No apport report written because MaxReports is reached already
Processing triggers for libc-bin (2.23-0ubuntu9) ...
Processing triggers for initramfs-tools (0.122ubuntu8.8) ...
update-initramfs: Generating /boot/initrd.img-4.10.0-32-generic
Errors were encountered while processing:
nvidia-384
nvidia-375
libcuda1-384
libcuda1-375
nvidia-367
nvidia-opencl-icd-384
nvidia-opencl-icd-375
E: Sub-process /usr/bin/dpkg returned an error code (1)

对于这个问题,先把之前的驱动卸掉

$ sudo apt-get remove --purge nvidia-375

然后看log文件为什么编译内核报错

$ vim /var/lib/dkms/nvidia-/384.98/build/make.log
......
CONFTEST: drm_atomic_available
CONFTEST: drm_atomic_modeset_nonblocking_commit_available
CONFTEST: is_export_symbol_gpl_refcount_inc
CONFTEST: is_export_symbol_gpl_refcount_dec_and_test
CC [M] /var/lib/dkms/nvidia-/384.98/build/nvidia/nv-instance.o
CC [M] /var/lib/dkms/nvidia-/384.98/build/nvidia/nv-gpu-numa.o
cc: error: unrecognized command line option ‘-fstack-protector-strong’
scripts/Makefile.build:: recipe for target '/var/lib/dkms/nvidia-384/384.98/build/nvidia/nv-instance.o' failed
make[]: *** [/var/lib/dkms/nvidia-/384.98/build/nvidia/nv-instance.o] Error
make[]: *** Waiting for unfinished jobs....
CC [M] /var/lib/dkms/nvidia-/384.98/build/nvidia/nv.o
CC [M] /var/lib/dkms/nvidia-/384.98/build/nvidia/nv-frontend.o
cc: error: unrecognized command line option ‘-fstack-protector-strong’
scripts/Makefile.build:: recipe for target '/var/lib/dkms/nvidia-384/384.98/build/nvidia/nv-gpu-numa.o' failed
make[]: *** [/var/lib/dkms/nvidia-/384.98/build/nvidia/nv-gpu-numa.o] Error
cc: error: unrecognized command line option ‘-fstack-protector-strong’
cc: error: unrecognized command line option ‘-fstack-protector-strong’
scripts/Makefile.build:: recipe for target '/var/lib/dkms/nvidia-384/384.98/build/nvidia/nv-frontend.o' failed
make[]: *** [/var/lib/dkms/nvidia-/384.98/build/nvidia/nv-frontend.o] Error
scripts/Makefile.build:: recipe for target '/var/lib/dkms/nvidia-384/384.98/build/nvidia/nv.o' failed
make[]: *** [/var/lib/dkms/nvidia-/384.98/build/nvidia/nv.o] Error
Makefile:: recipe for target '_module_/var/lib/dkms/nvidia-384/384.98/build' failed
make[]: *** [_module_/var/lib/dkms/nvidia-/384.98/build] Error
make[]: Leaving directory '/usr/src/linux-headers-4.10.0-32-generic'
Makefile:: recipe for target 'modules' failed
make: *** [modules] Error

从网上查了一下,得知‘-fstack-protector-strong’ 选项是gcc4.9以后的版本才加入的,也就是说需要安装gcc4.9以后的版本才可以编译通过.

通过 gcc -v 命令查看机器上的gcc是4.8版本,确认是gcc版本问题,所以升级gcc到4.9版本:

$ sudo apt-get install gcc-4.9
$ cd /usr/bin/
$ sudo ln -s /usr/bin/gcc-4.9 /usr/bin/gcc -f
$ gcc -v

然后继续驱动安装

$ sudo apt-get install nvidia-
$ sudo apt-get install mesa-common-dev
$ sudo apt-get install freeglut3-dev

之后重启系统让GTX1080显卡驱动生效.

2.CUDA8(支持GTX1080)的下载安装

(因为本机器之前已经装过,所以这里先直接测试,过段时间有空再重新搞机器踩一下坑再更新)

3.测试

通过nvidia-smi看到驱动改为了nvidia384(有些人显示的是nvidia367,虽然这里显示不同,但是从编译过程中看到nvidia367依赖于nvidia384,而且后面的测试和使用也没问题,所以没影响)

$ nvidia-smi
Tue Dec ::
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.98 Driver Version: 384.98 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| GeForce GTX Off | ::00.0 On | N/A |
| % 62C P2 139W / 200W | 7898MiB / 8112MiB | % Default |
+-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| G /usr/lib/xorg/Xorg 188MiB |
| G compiz 110MiB |
| C python 7587MiB |
+-----------------------------------------------------------------------------+

样例测试1:

$ cd NVIDIA_CUDA-.0_Samples/1_Utilities/deviceQuery
$ make
$ ./deviceQuery
./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected CUDA Capable device(s) Device : "GeForce GTX 1080"
CUDA Driver Version / Runtime Version 9.0 / 8.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: MBytes ( bytes)
() Multiprocessors, () CUDA Cores/MP: CUDA Cores
GPU Max Clock rate: MHz (1.85 GHz)
Memory Clock rate: Mhz
Memory Bus Width: -bit
L2 Cache Size: bytes
Maximum Texture Dimension Size (x,y,z) 1D=(), 2D=(, ), 3D=(, , )
Maximum Layered 1D Texture Size, (num) layers 1D=(), layers
Maximum Layered 2D Texture Size, (num) layers 2D=(, ), layers
Total amount of constant memory: bytes
Total amount of shared memory per block: bytes
Total number of registers available per block:
Warp size:
Maximum number of threads per multiprocessor:
Maximum number of threads per block:
Max dimension size of a thread block (x,y,z): (, , )
Max dimension size of a grid size (x,y,z): (, , )
Maximum memory pitch: bytes
Texture alignment: bytes
Concurrent copy and kernel execution: Yes with copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: / /
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 8.0, NumDevs = , Device0 = GeForce GTX
Result = PASS

样例测试2:

$ cd NVIDIA_CUDA-.0_Samples/5_Simulations/nbody
$ make
$ ./nbody -benchmark -numbodies= -device=
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= ) to run in simulation)
-device=<d> (where d=,,.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > ) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation) NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. > Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> Devices used for simulation
gpuDeviceInit() CUDA Device []: "GeForce GTX 1080
> Compute 6.1 CUDA device: [GeForce GTX ]
number of bodies =
bodies, total time for iterations: 2981.761 ms
= 219.790 billion interactions per second
= 4395.792 single-precision GFLOP/s at flops per interaction

4. 查看GPU工作状态

使用nvidia-smi命令即可。

如果要周期性显示,例如每10s 显示一次GPU的情况:

$ watch -n  nvidia-smi

具体如下所示:重要的参数主要是温度、内存使用、GPU占有率,具体如下红框所示。

另附:nvidia-smi 命令解读

======================================================================================

补充 2018.2.3

最近在另一台服务器装GTX1060之后遇到的问题:

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

原因:驱动装了cuda8.0和cudnn8.0版本,而tensorflow-gpu1.5的版本要求cuda9.0
解决方法:回滚tensorflow-gpu到1.4版本

pip install tensorflow-gpu==1.4 -i https://pypi.tuna.tsinghua.edu.cn/simple gevent

参考:

深度学习主机环境配置: Ubuntu16.04+Nvidia GTX 1080+CUDA8.0

ubuntu 16.04 更新 gcc/g++ 4.9.2

Linux下监视NVIDIA的GPU使用情况

查看GPU实时工作状态的命令

http://blog.csdn.net/w5688414/article/details/79187499

基于Ubuntu16.04的GeForce GTX 1080驱动安装,遇到的问题及对应的解决方法的更多相关文章

  1. GeForce GTX 1080 ti安装记录

    安装GeForce GTX 1080ti 安装GeForce GTX 1080ti,8+8pin需要全接,接4pin就开机显示器上会提示电源线没接完,将显示器线接在显卡上. 设置Win 10 pro ...

  2. ubuntu环境下,ubuntu16.04装机到nvdia显卡驱动安装、cuda8安装、cudnn安装

    首先是安装ubuntu16.04 A.制作u盘启动盘(提前准备好.ios文件): 1.安装u盘制作工具unetbootinsudo apt-get install unetbootin2.格式化u盘s ...

  3. 深度学习主机环境配置: Ubuntu16.04+GeForce GTX 1080+TensorFlow

    接上文<深度学习主机环境配置: Ubuntu16.04+Nvidia GTX 1080+CUDA8.0>,我们继续来安装 TensorFlow,使其支持GeForce GTX 1080显卡 ...

  4. 基于ubuntu16.04部署IBM开源区块链项目-弹珠资产管理(Marbles)

    前言 本教程基本上是对Marbles项目的翻译过程. 如果英文比较好的话,建议根据官方操作说明,一步步进行环境部署.当然你也可以参考本教程在自己的主机上部署该项目. Marbles 介绍 关于 Mar ...

  5. Ubuntu16.04系统美化、常用软件安装等,长期更新

    Ubuntu16.04系统美化.常用软件安装等,长期更新 IT之家啊 18-09-0915:00 因为我个人偏向于玩VPS.服务器之类的东西,所以一般我都是用CentOS.不过对于桌面版的Linux, ...

  6. ubuntu16.04+七彩虹GTX1060的NVIDIA驱动+Cuda8.0+cudnn5.1+tensorflow+keras搭建深度学习环境【学习笔记】【原创】

    平台信息:PC:ubuntu16.04.i5.七彩虹GTX1060显卡 作者:庄泽彬(欢迎转载,请注明作者) 说明:参考了网上的一堆的资料搭建了深度学习的开发环境,下班在宿舍折腾了好几个晚上才搞定,写 ...

  7. C#码农的大数据之路 - 使用Ambari自动化安装HDP2.6(基于Ubuntu16.04)并运行.NET Core编写的MR作业

    准备主机 准备3台主机,名称作用如下: 昵称 Fully Qualified Domain Name IP 作用 Ubuntu-Parrot head1.parrot 192.168.9.126 Am ...

  8. 基于ubuntu16.04快速构建Hyperledger Fabric网络

    前言 最近在参加一个比赛,使用到了区块链的开源软件hyperledger,由于之前从未接触过区块链,以及和区块链开发相关的内容,所有在网上查阅了大量的资料,并且通过学习yeasy(杨宝华)开源的入门书 ...

  9. Ubuntu16.04安裝最新Nvidia驱动

    在安装完Ubuntu之后,可能通过自带驱动无法更新,一直处于无法下载状态,那么就需要通过到Nvidia官网下载驱动,手动安装了 方法/步骤 通过度娘,打开NVIDIA官网,然后在下载驱动那里找到自己的 ...

随机推荐

  1. Apache-solr

    1.1. 下载 从Solr官方网站(http://lucene.apache.org/solr/ )下载Solr4.10.3,根据Solr的运行环境,Linux下需要下载lucene-4.10.3.t ...

  2. Job流程:Mapper类分析

    此文紧接Job流程:决定map个数的因素,Map任务被提交到Yarn后,被ApplicationMaster启动,任务的形式是YarnChild进程,在其中会执行MapTask的run()方法.无论是 ...

  3. uboot下如何查看内存里的数据

    答:使用md工具 md.b $address $count (从地址$address处显示$count个字节的数据,b=byte,8位) md.w $address $count (从地址$addre ...

  4. HA-web-services

    一.HA部署 本次实验的程序选型为heartbeat v1 + hearesources.资源有IP和httpd,filesystem 配置HA集群的前提: (1)各节点资源一致,硬件或软件环境一致 ...

  5. Caffe cpu版本 Linux配置命令及搭建

    Caffee 安装过程 1.安装依赖包 $ sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-de ...

  6. Spring boot 外部资源配置

    tomcat配置访问图片路径映射到磁盘路径   首先,我在调试页面的时候发现,图片路径为: 1 /webapps/pic_son/img/1234565456.jpg 但是,tomcat中webapp ...

  7. Android -- 提交数据到服务器,Get Post方式, 异步Http框架提交

    1. 发送请求到服务器有几种方式 (1)HttpURLConnection (2)Httpclient 同步框架 (3)AsyncHttpClient 异步框架 (https://github.com ...

  8. node.js 之 http 架设

    Node.js 安装配置 下载node.js安装mis 打开:cmd cd到node.js安装目录下 输入nodejs --version 显示版本号,证明安装成功 在其根目录下建server.js ...

  9. play的过滤类怎么实现继承问题

    原文: Example: public class Secure extends Controller {          @Before     static void checkAuthenti ...

  10. spring mvc: 多解析器映射(资源绑定视图解析器 + 内部资源[普通模式/]视图解析器)

    spring mvc: 多解析器映射(资源绑定视图解析器 + 内部资源[普通模式/]视图解析器) 资源绑定视图解析器 + 内部资源(普通模式)视图解析器 并存方式 内部资源视图解析器: http:// ...