本文首发于个人博客https://kezunlin.me/post/6580691f/,欢迎阅读!

compile opencv with CUDA support on windows 10

Series

Guide

requirements:

  • windows: 10
  • opencv: 3.1.0
  • nvidia driver: gtx 1060 382.05 (gtx 970m)
  • GPU arch(s): sm_61 (sm_52)
  • cuda: 8.0
  • cudnn: 5.0.5
  • cmake: 3.10.0
  • vs: vs2015 64

nvidia cuda CC

see cuda compute capacity

笔记本版本的显卡和台式机的计算能力是有差距的。

cpu vs gpu

for opencv functions

get source

Get opencv 3.1.0 for git and fix some bugs

  1. git clone https://github.com/opencv/opencv.git
  2. cd opencv
  3. git checkout -b v3.1.0 3.1.0
  4. # fix bugs for 3.1.0
  5. git cherry-pick 10896
  6. git cherry-pick cdb9c
  7. git cherry-pick 24dbb
  8. git branch
  9. master
  10. * v3.1.0

compile

  1. mkdir build && cd build && cmake-gui ..

config

configure with VS 2015 win64 with options

  1. BUILD_SHARED_LIBS ON
  2. CMAKE_CONFIGURATION_TYPES Release # Release
  3. CMAKE_CXX_FLAGS_RELEASE /MD /O2 /Ob2 /DNDEBUG /MP # for multiple processor
  4. WITH_VTK OFF
  5. BUILD_PERF_TESTS OFF # if ON, build errors occur
  6. WITH_CUDA ON
  7. CUDA_TOOLKIT_ROOT_DIR C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0
  8. #CUDA_ARCH_BIN 3.0 3.5 5.0 5.2 6.0 6.1 # very time-consuming
  9. CUDA_ARCH_PTX 3.0

for opencv

CUDA_ARCH_BIN 3.0 3.5 5.0 5.2 6.0 6.1 relate with

  1. -gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;

CUDA_ARCH_PTX 3.0 relate with

  1. -gencode;arch=compute_30,code=compute_30;

for caffe

the CUDA_ARCH_BIN parameter specifies multiple architectures so as to support a variety of GPU boards. otherwise, cuda programs will not run with other type of GPU boards.

为了支持在多个不同计算能力的GPU上运行可执行程序,opencv/caffe编译过程中需要支持多个不同架构,eg. CUDA_ARCH_BIN 3.0 3.5 5.0 5.2 6.0 6.1, 因此编译过程非常耗时。在编译的而过程中尽可能选择需要发布release版本的GPU架构进行配置编译。

configure and output:

  1. Selecting Windows SDK version 10.0.14393.0 to target Windows 10.0.17134.
  2. found IPP (ICV version): 9.0.1 [9.0.1]
  3. at: C:/compile/opencv/3rdparty/ippicv/unpack/ippicv_win
  4. CUDA detected: 8.0
  5. CUDA NVCC target flags: -gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_30,code=compute_30
  6. Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
  7. To enable PlantUML support, set PLANTUML_JAR environment variable or pass -DPLANTUML_JAR=<filepath> option to cmake
  8. Could NOT find PythonInterp: Found unsuitable version "1.4", but required is at least "3.4" (found C:/Users/zunli/.babun/cygwin/bin/python)
  9. Could NOT find PythonInterp: Found unsuitable version "1.4", but required is at least "3.2" (found C:/Users/zunli/.babun/cygwin/bin/python)
  10. Could NOT find Matlab (missing: MATLAB_MEX_SCRIPT MATLAB_INCLUDE_DIRS MATLAB_ROOT_DIR MATLAB_LIBRARIES MATLAB_LIBRARY_DIRS MATLAB_MEXEXT MATLAB_ARCH MATLAB_BIN)
  11. General configuration for OpenCV 3.1.0 =====================================
  12. Version control: 3.1.0-3-g5e9beb8
  13. Platform:
  14. Host: Windows 10.0.17134 AMD64
  15. CMake: 3.10.0
  16. CMake generator: Visual Studio 14 2015 Win64
  17. CMake build tool: C:/Program Files (x86)/MSBuild/14.0/bin/MSBuild.exe
  18. MSVC: 1900
  19. C/C++:
  20. Built as dynamic libs?: YES
  21. C++ Compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe (ver 19.0.24215.1)
  22. C++ flags (Release): /DWIN32 /D_WINDOWS /W4 /GR /EHa /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /wd4251 /wd4324 /wd4275 /wd4589 /MP8 /MD /O2 /Ob2 /DNDEBUG /MP /Zi
  23. C++ flags (Debug): /DWIN32 /D_WINDOWS /W4 /GR /EHa /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /wd4251 /wd4324 /wd4275 /wd4589 /MP8 /MDd /Zi /Ob0 /Od /RTC1
  24. C Compiler: C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe
  25. C flags (Release): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /MP8 /MD /O2 /Ob2 /DNDEBUG /Zi
  26. C flags (Debug): /DWIN32 /D_WINDOWS /W3 /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi /MP8 /MDd /Zi /Ob0 /Od /RTC1
  27. Linker flags (Release): /machine:x64 /INCREMENTAL:NO /debug
  28. Linker flags (Debug): /machine:x64 /debug /INCREMENTAL
  29. Precompiled headers: YES
  30. Extra dependencies: comctl32 gdi32 ole32 setupapi ws2_32 vfw32 cudart nppc nppi npps cufft -LC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64
  31. 3rdparty dependencies: zlib libjpeg libwebp libpng libtiff libjasper IlmImf
  32. OpenCV modules:
  33. To be built: cudev core cudaarithm flann imgproc ml video cudabgsegm cudafilters cudaimgproc cudawarping imgcodecs photo shape videoio cudacodec highgui objdetect ts features2d calib3d cudafeatures2d cudalegacy cudaobjdetect cudaoptflow cudastereo stitching superres videostab python2
  34. Disabled: world
  35. Disabled by dependency: -
  36. Unavailable: java python3 viz
  37. Windows RT support: NO
  38. GUI:
  39. QT: NO
  40. Win32 UI: YES
  41. OpenGL support: NO
  42. VTK support: NO
  43. Media I/O:
  44. ZLib: build (ver 1.2.8)
  45. JPEG: build (ver 90)
  46. WEBP: build (ver 0.3.1)
  47. PNG: build (ver 1.6.19)
  48. TIFF: build (ver 42 - 4.0.2)
  49. JPEG 2000: build (ver 1.900.1)
  50. OpenEXR: build (ver 1.7.1)
  51. GDAL: NO
  52. Video I/O:
  53. Video for Windows: YES
  54. DC1394 1.x: NO
  55. DC1394 2.x: NO
  56. FFMPEG: YES (prebuilt binaries)
  57. codec: YES (ver 56.41.100)
  58. format: YES (ver 56.36.101)
  59. util: YES (ver 54.27.100)
  60. swscale: YES (ver 3.1.101)
  61. resample: NO
  62. gentoo-style: YES
  63. GStreamer: NO
  64. OpenNI: NO
  65. OpenNI PrimeSensor Modules: NO
  66. OpenNI2: NO
  67. PvAPI: NO
  68. GigEVisionSDK: NO
  69. DirectShow: YES
  70. Media Foundation: NO
  71. XIMEA: NO
  72. Intel PerC: NO
  73. Parallel framework: Concurrency
  74. Other third-party libraries:
  75. Use IPP: 9.0.1 [9.0.1]
  76. at: C:/compile/opencv/3rdparty/ippicv/unpack/ippicv_win
  77. Use IPP Async: NO
  78. Use Eigen: NO
  79. Use Cuda: YES (ver 8.0)
  80. Use OpenCL: YES
  81. Use custom HAL: NO
  82. NVIDIA CUDA
  83. Use CUFFT: YES
  84. Use CUBLAS: NO
  85. USE NVCUVID: NO
  86. NVIDIA GPU arch: 30 35 50 52 60 61
  87. NVIDIA PTX archs: 30
  88. Use fast math: NO
  89. OpenCL:
  90. Version: dynamic
  91. Include path: C:/compile/opencv/3rdparty/include/opencl/1.2
  92. Use AMDFFT: NO
  93. Use AMDBLAS: NO
  94. Python 2:
  95. Interpreter: C:/Python27/python.exe (ver 2.7.13)
  96. Libraries: C:/Python27/libs/python27.lib (ver 2.7.13)
  97. numpy: C:/Python27/lib/site-packages/numpy/core/include (ver 1.11.3)
  98. packages path: C:/Python27/Lib/site-packages
  99. Python 3:
  100. Interpreter: NO
  101. Python (for build): C:/Python27/python.exe
  102. Java:
  103. ant: NO
  104. JNI: C:/Program Files/Java/jdk1.8.0_161/include C:/Program Files/Java/jdk1.8.0_161/include/win32 C:/Program Files/Java/jdk1.8.0_161/include
  105. Java wrappers: NO
  106. Java tests: NO
  107. Matlab: Matlab not found or implicitly disabled
  108. Documentation:
  109. Doxygen: NO
  110. PlantUML: NO
  111. Tests and samples:
  112. Tests: YES
  113. Performance tests: NO
  114. C/C++ Examples: NO
  115. Install path: C:/compile/opencv/build/install
  116. cvconfig.h is in: C:/compile/opencv/build
  117. -----------------------------------------------------------------
  118. Configuring done
  119. Generating done

Notice for gencode

  1. CUDA NVCC target flags: -gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_30,code=compute_30

build

Open OpenCV.sln with VS 2015 and build release version.

this may take hours to finish.

errors

possible solutions

With BUILD_PERF_TESTS and BUILD_TESTS disabled, I managed to build OpenCV 3.1 with CUDA 8.0 on Windows 10 with VS2015 x64 arch target. Without building test/performance modules, the build process costs less time as well : )

I actually got it to work both on my laptop and my desktop (GTX960M and GTX970 respectively) running with OpenCV 3.2 and the latest version of CUDA 8.0 for Win10 in Visual Studio 15 Community! What I did was to enable WITH_CUBLAS aswell as WITH_CUDA. I also turned off BUILD_PERF_TESTS and BUILD_TESTS. The configuration was built using the Visual Studio 14 2015 C++ compiler.

my solution:

  1. disable `BUILD_PERF_TESTS`

configure and build again. this time cost only about 1 minutes.

after error fixed,build results

demo

cuda-module

OpenCV GPU module is written using CUDA, therefore it benefits from the CUDA ecosystem.

GPU modules includes class cv::cuda::GpuMat which is a primary container for data kept in GPU memory. It’s interface is very similar with cv::Mat, its CPU counterpart. All GPU functions receive GpuMat as input and output arguments. This allows to invoke several GPU algorithms without downloading data. GPU module API interface is also kept similar with CPU interface where possible. So developers who are familiar with Opencv on CPU could start using GPU straightaway.

The GPU module is designed as a host-level API. This means that if you have pre-compiled OpenCV GPU binaries, you are not required to have the CUDA Toolkit installed or write any extra code to make use of the GPU.

CMakeLists.txt

  1. find_package(OpenCV REQUIRED COMPONENTS core highgui imgproc features2d calib3d
  2. cudaarithm cudabgsegm cudafilters cudaimgproc cudawarping cudafeatures2d # for cuda-enabled
  3. ) #
  4. MESSAGE( [Main] " OpenCV_INCLUDE_DIRS = ${OpenCV_INCLUDE_DIRS}")
  5. MESSAGE( [Main] " OpenCV_LIBS = ${OpenCV_LIBS}")

demo.cpp

In the sample below an image is loaded from local file, next it is uploaded to GPU, thresholded, downloaded and displayed.

  1. #include <opencv2/cudaarithm.hpp>
  2. #include <opencv2/cudabgsegm.hpp>
  3. #include <opencv2/cudafilters.hpp>
  4. #include <opencv2/cudaimgproc.hpp>
  5. #include <opencv2/cudawarping.hpp>
  6. #include <opencv2/cudafeatures2d.hpp>
  7. int test_opencv_gpu()
  8. {
  9. try
  10. {
  11. cv::Mat src_host = cv::imread("file.png", CV_LOAD_IMAGE_GRAYSCALE);
  12. cv::cuda::GpuMat dst, src;
  13. src.upload(src_host);
  14. cv::cuda::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY);
  15. cv::Mat result_host;
  16. dst.download(result_host);
  17. cv::imshow("Result", result_host);
  18. cv::waitKey();
  19. }
  20. catch (const cv::Exception& ex)
  21. {
  22. std::cout << "Error: " << ex.what() << std::endl;
  23. }
  24. return 0;
  25. }

cpu vs gpu time cost

  • (1)对于分辨率不特别大的图片间的ORB特征匹配,CPU运算得比GPU版的快(由于图像上传到GPU消耗了时间)
  • (2)但对于分辨率较大的图片,或者GPU比CPU好的机器(比如Nvidia Jetson系列),GPU版的ORB算法比CPU版的程序更高效。

problems

(1) 使用cuda版本的opencv caffe网络的第一次创建非常耗时,后面的网络创建则非常快。

(2) opencv的gpu代码比cpu代码慢,初次启动多耗费20s左右。(事实是由于编译的caffe和GPU计算力不匹配导致的)

reasons

Your problem is that CUDA needs to initialize! And it will generally takes between serveral seconds

Why first function call is slow?

That is because of initialization overheads. On first GPU function call Cuda Runtime API is initialized implicitly.

The first gpu function call is always takes more time, because CUDA initialize context for device.

The following calls will be faster.

Not Reasons:

(1) CPU clockspeed is 10x faster than GPU clockspeed.

(2) memory transfer times between host (CPU) and device (GPU) (upload,downloa data)

deploy

runtime errors

gtx 1060 编译的opencv caffe在gtx 970m上运行出现错误

im2col.cu Check failed: error == cudaSuccess (8 vs. 0) invalid device function

  1. gtx 1060 sm_61
  2. gtx 970m sm_52

im2col 是caffe的源文件,表明gtx 970m的计算能力不支持可执行文件的运行。

reasons

see what-is-the-purpose-of-using-multiple-arch-flags-in-nvidias-nvcc-compiler

Roughly speaking, the code compilation flow goes like this:

CUDA C/C++ device code source --> PTX --> SASS

The virtual architecture (e.g. compute_20, whatever is specified by -arch compute...) determines what type of PTX code will be generated. The additional switches (e.g. -code sm_21) determine what type of SASS code will be generated. SASS is actually executable object code for a GPU (machine language). An executable can contain multiple versions of SASS and/or PTX, and there is a runtime loader mechanism that will pick appropriate versions based on the GPU actually being used.

win7/win10 deploy

  • compile opencv caffe on windows 10 for GTX 1060
  • deoply on windows 7 for GTX 1080 Ti successfully

for win7, if we install 398.82-desktop-win8-win7-64bit-international-whql.exe,errors may occur:

  1. > nvidia-smi.exe
  2. Failed to initialize NVML: Unknown error

Solutions: use older drivers 385.69

linux/window performance

(1) api在linux平均耗时3ms;同样的代码在windows平均耗时14ms

(2) vs编译开启代码优化前后性能相差接近5倍,125ms vs 25ms

(3) cmake编译RELEASE选项默认已经开启了代码优化 -O3

Reference

History

  • 20180713: created.

Copyright

windows 10 上源码编译OpenCV并支持CUDA | compile opencv with CUDA support on windows 10的更多相关文章

  1. windows 10上源码编译libjpeg-turbo和使用教程 | compile and use libjpeg-turbo on windows 10

    本文首发于个人博客https://kezunlin.me/post/83828674/,欢迎阅读! compile and use libjpeg-turbo on windows 10 Series ...

  2. windows 10 上源码编译boost 1.66.0 | compile boost 1.66.0 from source on windows 10

    本文首发于个人博客https://kezunlin.me/post/854071ac/,欢迎阅读! compile boost 1.66.0 from source on windows 10 Ser ...

  3. [Part 4] 在Windows 10上源码编译PCL 1.8.1支持VTK和QT,可视化三维点云

    本文首发于个人博客https://kezunlin.me/post/2d809f92/,欢迎阅读! Part-4: Compile pcl with vtk qt5 support from sour ...

  4. windows 10上源码编译dlib教程 | compile dlib on windows 10

    本文首发于个人博客https://kezunlin.me/post/654a6d04/,欢迎阅读! compile dlib on windows 10 Series Part 1: compile ...

  5. windows 10 上源码编译opengv | compile opengv on windows 10 from source

    本文首发于个人博客https://kezunlin.me/post/51cd9fa0/,欢迎阅读! compile opengv on windows 10 from source Series co ...

  6. Windows 10上源码编译glog和gflags 编写glog-config.cmake和gflags-config.cmake | compile glog and glags on windows from source

    本文首发于个人博客https://kezunlin.me/post/bb64e398/,欢迎阅读! compile glog v0.3.5 and glags on windows from sour ...

  7. Windows 10上源码编译Poco并编写httpserver和tcpserver | compile and install poco cpp library on windows

    本文首发于个人博客https://kezunlin.me/post/9587bb47/,欢迎阅读! compile and install poco cpp library on windows Se ...

  8. [Windows篇] 在windows 10上源码编译gtest 并编写CMakeLists.txt

    本文首发于个人博客https://kezunlin.me/post/aca50ff8/,欢迎阅读! compile gtest on windows 10 Guide compile gtest on ...

  9. ubuntu 16.04上源码编译libjpeg-turbo和使用教程 | compile and use libjpeg-turbo on ubuntu 16.04

    本文首发于个人博客https://kezunlin.me/post/9f626e7a/,欢迎阅读! compile and use libjpeg-turbo on ubuntu 16.04 Seri ...

随机推荐

  1. uni-app swiper设置自定义高度

    话不多少先上图, 大家可以看到图片中红色区域是头部区域,黄色区域则是我们要滑动的区域. 大家可以在uni-app官网上看到swiper高度是默认100%,而swiper-item则是要有固定宽高的,要 ...

  2. OsmocomBB软件实现栈概况

    OsmocomBB软件实现栈概况 简单地说,本文仅描述软件中GSM信号接收到部分. 暂不提及发送流程,引导加载/引导流程,以及各种控制路径特别是从layer1到RF硬件. 首先,通过天线接收RF信号, ...

  3. Java TCP协议字节处理工具类

    1.使用 tcp 协议 读取 输入流的固定长度的字节数 public static byte[] getTcpSpecificBytes(BufferedInputStream bis,int len ...

  4. PHP spl_autoload和class_exsits使用技能

    本文章的PHP使用版本:5.4.7 PHP建议使用: spl_autoload_register 那么写了一种实现 文件路径 core core.php ChildrenClass.php Paren ...

  5. SpringBoot与MybatisPlus3.X整合之字段类型处理器(八)

    pom.xml <dependencies> <dependency> <groupId>org.springframework.boot</groupId& ...

  6. QTCreator配置调试参数

    1. 调试参数在“Projects”的配编译参数那儿,编译参数build的旁边run中 2. 加断点是在文件行数左边那个地方

  7. phpstorm 2016.2.2 激活

    2016年7月14日 phpstorm 推送2016.2 更新 2016年10月25日phpstorm 推送2016.2.2 更新 2016年11月24日phpstorm 推送2016.3 更新 下面 ...

  8. 哪种方式更适合在React中获取数据?

    作者:Dmitri Pavlutin 译者:小维FE 原文:dmitripavlutin.com 国外文章,笔者采用意译的方式,以保证文章的可读性. 当执行像数据获取这样的I/O操作时,你必须发起获取 ...

  9. 学习笔记01HTML

    1.五大浏览器:IE,FireFox,Chrome,Opera,Safari(Apple)所有浏览器都是这五大浏览器中作为核心引擎的.Trident(引擎):就是IE浏览器的WebBrowser控制. ...

  10. 操作系统实现(一):从Bootloader到ELF内核(转载)

    原文链接: http://www.cppblog.com/airtrack/archive/2014/10/30/208729.html Bootloader 我们知道计算机启动是从BIOS开始,再由 ...