High level GPU programming in C++
https://github.com/prem30488/C2CUDATranslator
http://www.training.prace-ri.eu/uploads/tx_pracetmo/GPSMEToolkitIntro.pdf
gp-sme.co.uk
https://www.openacc.org/get-started
http://www.openmp.org/ 好像只是多核编程, 不像上面几个,是c代码转gpu c 代码。
There are many high-level libraries dedicated to GPGPU programming. Since they rely on CUDA and/or OpenCL, they have to be chosen wisely (a CUDA-based program will not run on AMD's GPUs, unless it goes through a pre-processing step with projects such as gpuocelot).
CUDA
You can find some examples of CUDA libraries on the NVIDIA website.
- Thrust: the official description speaks for itself
Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust's high-level interface greatly enhances programmer productivity while enabling performance portability between GPUs and multicore CPUs. Interoperability with established technologies (such as CUDA, TBB, and OpenMP) facilitates integration with existing software.
As @Ashwin pointed out, the STL-like syntax of Thrust makes it a widely chosen library when developing CUDA programs. A quick look at the examples shows the kind of code you will be writing if you decide to use this library. NVIDIA's website presents the key features of this library. A video presentation (from GTC 2012) is also available.
- CUB: the official description tells us:
CUB provides state-of-the-art, reusable software components for every layer of the CUDA programming mode. It is a flexible library of cooperative threadblock primitives and other utilities for CUDA kernel programming.
It provides device-wide, block-wide and warp-wide parallel primitives such as parallel sort, prefix scan, reduction, histogram etc.
It is open-source and available on GitHub. It is not high-level from an implementation point of view (you develop in CUDA kernels), but provides high-level algorithms and routines.
- mshadow: lightweight CPU/GPU matrix/tensor template library in C++/CUDA.
This library is mostly used for machine learning, and relies on expression templates.
- Eigen: support for CUDA with a new Tensor class have been added in version 3.3. It is used by Google in TensorFlow, and is still experimental.
Starting from Eigen 3.3, it is now possible to use Eigen's objects and algorithms within CUDA kernels. However, only a subset of features are supported to make sure that no dynamic allocation is triggered within a CUDA kernel.
OpenCL
Note that OpenCL does more than GPGPU computing, since it supports heterogeneous platforms (multi-core CPUs, GPUs etc.).
- OpenACC: this project provides OpenMP-like support for GPGPU. A large part of the programming is done implicitly by the compiler and the run-time API. You can find a sample code on their website.
The OpenACC Application Program Interface describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran to be offloaded from a host CPU to an attached accelerator, providing portability across operating systems, host CPUs and accelerators.
- Bolt: open-source library with STL-like interface.
Bolt is a C++ template library optimized for heterogeneous computing. Bolt is designed to provide high-performance library implementations for common algorithms such as scan, reduce, transform, and sort. The Bolt interface was modeled on the C++ Standard Template Library (STL). Developers familiar with the STL will recognize many of the Bolt APIs and customization techniques.
Boost.Compute: as @Kyle Lutz said, Boost.Compute provides a STL-like interface for OpenCL. Note that this is not an official Boost library (yet).
SkelCL "is a library providing high-level abstractions for alleviated programming of modern parallel heterogeneous systems". This library relies on skeleton programming, and you can find more information in their research papers.
CUDA + OpenCL
- ArrayFire is an open-source (used to be proprietary) GPGPU programming library. They first targeted CUDA, but now support OpenCL as well. You can check the examples available online. NVIDIA's website provides a good summary of its key features.
Complementary information
Although this is not really in the scope of this question, there is also the same kind of support for other programming languages:
- Python: PyCUDA for CUDA, Clyther and PyOpenCL for OpenCL. There is a dedicated StackOverflow question for this.
- Java: JCuda for CUDA, and for OpenCL, you can check this other question.
If you need to do linear algebra (for instance) or other specific operations, dedicated math libraries are also available for CUDA and OpenCL (e.g. ViennaCL, CUBLAS, MAGMA etc.).
Also note that using these libraries does not prevent you from doing some low-level operations if you need to do some very specific computation.
Finally, we can mention the future of the C++ standard library. There has been extensive work to add parallelism support. This is still a technical specification, and GPUs are not explicitely mentioned AFAIK (although NVIDIA's Jared Hoberock, developer of Thrust, is directly involved), but the will to make this a reality is definitely there.
High level GPU programming in C++的更多相关文章
- 把书《CUDA By Example an Introduction to General Purpose GPU Programming》读薄
鉴于自己的毕设需要使用GPU CUDA这项技术,想找一本入门的教材,选择了Jason Sanders等所著的书<CUDA By Example an Introduction to Genera ...
- R – GPU Programming for All with ‘gpuR’
INTRODUCTION GPUs (Graphic Processing Units) have become much more popular in recent years for compu ...
- Udacity并行计算课程笔记-The GPU Programming Model
一.传统的提高计算速度的方法 faster clocks (设置更快的时钟) more work over per clock cycle(每个时钟周期做更多的工作) more processors( ...
- 【Udacity并行计算课程笔记】- lesson 1 The GPU Programming Model
一.传统的提高计算速度的方法 faster clocks (设置更快的时钟) more work over per clock cycle(每个时钟周期做更多的工作) more processors( ...
- [CUDA] 00 - GPU Driver Installation & Concurrency Programming
前言 对,这是一个高大上的技术,终于要做老崔当年做过的事情了,生活很传奇. 一.主流 GPU 编程接口 1. CUDA 是英伟达公司推出的,专门针对 N 卡进行 GPU 编程的接口.文档资料很齐全,几 ...
- D3D9 GPU Hacks (转载)
D3D9 GPU Hacks I’ve been trying to catch up what hacks GPU vendors have exposed in Direct3D9, and tu ...
- 深入剖析GPU Early Z优化
最近在公司群里同事发了一个UE4关于Mask材质的优化,比如在场景中有大面积的草和树的时候,可以在很大程度上提高效率.这其中的原理就是利用了GPU的特性Early Z,但是它的做法跟我最开始的理解有些 ...
- PatentTips - Heterogeneous Parallel Primitives Programming Model
BACKGROUND 1. Field of the Invention The present invention relates generally to a programming model ...
- GPU深度发掘(一)::GPGPU数学基础教程
作者:Dominik Göddeke 译者:华文广 Contents 介绍 准备条件 硬件设备要求 软件设备要求 两者选择 初始化OpenGL GLUT OpenGL ...
随机推荐
- 自学Zabbix8.1 Regular expressions 正则表达式
点击返回:自学Zabbix之路 点击返回:自学Zabbix4.0之路 点击返回:自学zabbix集锦 自学Zabbix8.1 Regular expressions 正则表达式 1. 配置 点击Adm ...
- 【转】嵌入式系统 Boot Loader 技术内幕,带你完全了解Boot Loader
在专用的嵌入式板子运行 GNU/Linux 系统已经变得越来越流行.一个嵌入式 Linux 系统从软件的角度看通常可以分为四个层次: 1. 引导加载程序.包括固化在固件(firmware)中的 boo ...
- C# 推箱子游戏&对战游戏
推箱子游戏提纲,只有向右向上的操作,向左向下同理,后期需完善. namespace 推箱子 { class Program { static void Main(string[] args) { // ...
- js的append拼接html丢失css样式解决
htmlApp += "<li id='leftli"+lunci+"'>"; htmlApp += "<span id='left ...
- 一起使用mock数据动态创建表格
在ant-design中,我们创建一个基础table会怎么实现呢? 如下代码可视,我们会自己创建一些数据,在表格中渲染出来,如下 <Card title="基础表格"> ...
- JSONObject的toBean 和 fromObject
public static void main(String[] args) { Map map=new HashMap(); map.put("我","妹") ...
- Prometheus jvm_exporter监控zookeeper
Zookeeper Prometheus 监控zookeeper使用jvm_exporter来采集数据,jvm_exporter是一个可以配置抓取和暴露JMX目标的mBeans的收集器. 下载java ...
- python os模块 常用命令
python编程时,经常和文件.目录打交道,这是就离不了os模块.os模块包含普遍的操作系统功能,与具体的平台无关.以下列举常用的命令 1. os.name()——判断现在正在实用的平台,Window ...
- 通过Cloudera Manager部署CDH5.15.1的webUI界面详解
通过Cloudera Manager部署CDH5.15.1的webUI界面详解 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 本篇博客CDH的部署完全通过Cloudera Mana ...
- ISO七层模型详解
ISO七层模型详解 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 在我刚刚接触运维这个行业的时候,去面试时总是会做一些面试题,笔试题就是看一个运维工程师的专业技能的掌握情况,这个很 ...