High level GPU programming in C++
https://github.com/prem30488/C2CUDATranslator
http://www.training.prace-ri.eu/uploads/tx_pracetmo/GPSMEToolkitIntro.pdf
gp-sme.co.uk
https://www.openacc.org/get-started
http://www.openmp.org/ 好像只是多核编程, 不像上面几个,是c代码转gpu c 代码。
There are many high-level libraries dedicated to GPGPU programming. Since they rely on CUDA and/or OpenCL, they have to be chosen wisely (a CUDA-based program will not run on AMD's GPUs, unless it goes through a pre-processing step with projects such as gpuocelot).
CUDA
You can find some examples of CUDA libraries on the NVIDIA website.
- Thrust: the official description speaks for itself
Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust's high-level interface greatly enhances programmer productivity while enabling performance portability between GPUs and multicore CPUs. Interoperability with established technologies (such as CUDA, TBB, and OpenMP) facilitates integration with existing software.
As @Ashwin pointed out, the STL-like syntax of Thrust makes it a widely chosen library when developing CUDA programs. A quick look at the examples shows the kind of code you will be writing if you decide to use this library. NVIDIA's website presents the key features of this library. A video presentation (from GTC 2012) is also available.
- CUB: the official description tells us:
CUB provides state-of-the-art, reusable software components for every layer of the CUDA programming mode. It is a flexible library of cooperative threadblock primitives and other utilities for CUDA kernel programming.
It provides device-wide, block-wide and warp-wide parallel primitives such as parallel sort, prefix scan, reduction, histogram etc.
It is open-source and available on GitHub. It is not high-level from an implementation point of view (you develop in CUDA kernels), but provides high-level algorithms and routines.
- mshadow: lightweight CPU/GPU matrix/tensor template library in C++/CUDA.
This library is mostly used for machine learning, and relies on expression templates.
- Eigen: support for CUDA with a new Tensor class have been added in version 3.3. It is used by Google in TensorFlow, and is still experimental.
Starting from Eigen 3.3, it is now possible to use Eigen's objects and algorithms within CUDA kernels. However, only a subset of features are supported to make sure that no dynamic allocation is triggered within a CUDA kernel.
OpenCL
Note that OpenCL does more than GPGPU computing, since it supports heterogeneous platforms (multi-core CPUs, GPUs etc.).
- OpenACC: this project provides OpenMP-like support for GPGPU. A large part of the programming is done implicitly by the compiler and the run-time API. You can find a sample code on their website.
The OpenACC Application Program Interface describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attached accelerator, providing portability across operating systems, host CPUs and accelerators.
- Bolt: open-source library with STL-like interface.
Bolt is a C++ template library optimized for heterogeneous computing. Bolt is designed to provide high-performance library implementations for common algorithms such as scan, reduce, transform, and sort. The Bolt interface was modeled on the C++ Standard Template Library (STL). Developers familiar with the STL will recognize many of the Bolt APIs and customization techniques.
Boost.Compute: as @Kyle Lutz said, Boost.Compute provides a STL-like interface for OpenCL. Note that this is not an official Boost library (yet).
SkelCL "is a library providing high-level abstractions for alleviated programming of modern parallel heterogeneous systems". This library relies on skeleton programming, and you can find more information in their research papers.
CUDA + OpenCL
- ArrayFire is an open-source (used to be proprietary) GPGPU programming library. They first targeted CUDA, but now support OpenCL as well. You can check the examples available online. NVIDIA's website provides a good summary of its key features.
Complementary information
Although this is not really in the scope of this question, there is also the same kind of support for other programming languages:
- Python: PyCUDA for CUDA, Clyther and PyOpenCL for OpenCL. There is a dedicated StackOverflow question for this.
- Java: JCuda for CUDA, and for OpenCL, you can check this other question.
If you need to do linear algebra (for instance) or other specific operations, dedicated math libraries are also available for CUDA and OpenCL (e.g. ViennaCL, CUBLAS, MAGMA etc.).
Also note that using these libraries does not prevent you from doing some low-level operations if you need to do some very specific computation.
Finally, we can mention the future of the C++ standard library. There has been extensive work to add parallelism support. This is still a technical specification, and GPUs are not explicitely mentioned AFAIK (although NVIDIA's Jared Hoberock, developer of Thrust, is directly involved), but the will to make this a reality is definitely there.
High level GPU programming in C++的更多相关文章
- 把书《CUDA By Example an Introduction to General Purpose GPU Programming》读薄
鉴于自己的毕设需要使用GPU CUDA这项技术,想找一本入门的教材,选择了Jason Sanders等所著的书<CUDA By Example an Introduction to Genera ...
- R – GPU Programming for All with ‘gpuR’
INTRODUCTION GPUs (Graphic Processing Units) have become much more popular in recent years for compu ...
- Udacity并行计算课程笔记-The GPU Programming Model
一.传统的提高计算速度的方法 faster clocks (设置更快的时钟) more work over per clock cycle(每个时钟周期做更多的工作) more processors( ...
- 【Udacity并行计算课程笔记】- lesson 1 The GPU Programming Model
一.传统的提高计算速度的方法 faster clocks (设置更快的时钟) more work over per clock cycle(每个时钟周期做更多的工作) more processors( ...
- [CUDA] 00 - GPU Driver Installation & Concurrency Programming
前言 对,这是一个高大上的技术,终于要做老崔当年做过的事情了,生活很传奇. 一.主流 GPU 编程接口 1. CUDA 是英伟达公司推出的,专门针对 N 卡进行 GPU 编程的接口.文档资料很齐全,几 ...
- D3D9 GPU Hacks (转载)
D3D9 GPU Hacks I’ve been trying to catch up what hacks GPU vendors have exposed in Direct3D9, and tu ...
- 深入剖析GPU Early Z优化
最近在公司群里同事发了一个UE4关于Mask材质的优化,比如在场景中有大面积的草和树的时候,可以在很大程度上提高效率.这其中的原理就是利用了GPU的特性Early Z,但是它的做法跟我最开始的理解有些 ...
- PatentTips - Heterogeneous Parallel Primitives Programming Model
BACKGROUND 1. Field of the Invention The present invention relates generally to a programming model ...
- GPU深度发掘(一)::GPGPU数学基础教程
作者:Dominik Göddeke 译者:华文广 Contents 介绍 准备条件 硬件设备要求 软件设备要求 两者选择 初始化OpenGL GLUT OpenGL ...
随机推荐
- 自学Zabbix7.1 IT services
自学Zabbix7.1 IT services 1. 概念IT Services 服务器或者某项服务.业务的可用率,不懂技术的上级领导会过问最近服务器可用率如何.所有api的状况怎么样?通常一些技术人 ...
- 【51NOD 1847】奇怪的数学题(莫比乌斯反演,杜教筛,min_25筛,第二类斯特林数)
[51NOD 1847]奇怪的数学题(莫比乌斯反演,杜教筛,min_25筛,第二类斯特林数) 题面 51NOD \[\sum_{i=1}^n\sum_{j=1}^nsgcd(i,j)^k\] 其中\( ...
- 【AGC016E】Poor Turkeys
Description 有\(n\)(\(1 \le n \le 400\))只鸡,接下来按顺序进行\(m\)(\(1 \le m \le 10^5\))次操作.每次操作涉及两只鸡,如果都存在则随意拿 ...
- 【bzoj3064】 CPU监控
http://www.lydsy.com/JudgeOnline/problem.php?id=3064 (题目链接) 题意 给出一个长度为$n$的数列$A$,同时定义一个辅助数组$B$,$B$开始与 ...
- RocketMQ介绍与云服务器安装
RocketMQ 介绍与概念 在github上的说法来看: Apache RocketMQ是一个分布式消息传递和流媒体平台,具有低延迟,高性能和可靠性,万亿级容量和灵活的可扩展性.它提供了多种功能: ...
- cocos2d-x入门学习笔记,主要介绍cocos2d-x的基本结构,并且介绍引擎自带的示例
cocos2d-x 3.0 制作横版格斗游戏 http://philon.cn/post/cocos2d-x-3.0-zhi-zuo-heng-ban-ge-dou-you-xi http://blo ...
- 使用ADO.NET操作Oracle数据库
本文将示例使用C#的ADO.NET技术调用Oralce的存储过程和函数及操作Oracle数据库. 在oracle的hr数据库中建立存储过程 在oralce的hr数据库中建立函数 新建控制台项目,在主函 ...
- Shell变量的取用、删除、取代与替换
<<鸟哥的私房菜>> 注意: 通配符适用的地方:shell命令行或者shell脚本中 正则表达式适用的地方:字符串处理时,一般有一般正则和Perl正则. 在文本过滤工具里,都是 ...
- 关于TCP连接状态的解释
TCP各个状态主要存在于三次握手和四次挥手的过程 1.TCP建立连接时的三次握手: 服务端应用监听端口处于LISTEN状态,等待建立连接. 第一次握手:客户端发送SYN=一个随机数,然后进入SYN_S ...
- CodeForces - 327D Block Tower
D. Block Tower time limit per test 2 seconds memory limit per test 256 megabytes input standard inpu ...