Getting Started with OpenMP*

Abstract

As you probably know by now, to get the maximum performance benefit from a processor with Hyper-Threading Technology, an application needs to be executed in parallel. Parallel execution requires threads, and threading an application is not trivial. What you may not know is that tools like OpenMP* can make the process a lot easier.

This is the first in a series of three white-papers that teach you, an experienced C/C++ programmer, how to use OpenMP to get the most out of Hyper-Threading Technology. This first paper shows you how to parallelize loops, called work sharing. The second paper teaches you how to exploit non-loop parallelism and some additional OpenMP features. The final paper discusses the OpenMP runtime library functions, the Intel® C++ Compiler, and how to debug your application if things go wrong.

A Quick Introduction to OpenMP

The designers of OpenMP wanted to provide an easy method to thread applications without requiring that the programmer know how to create, synchronize, and destroy threads or even requiring him or her to determine how many threads to create. To achieve these ends, the OpenMP designers developed a platform-independent set of compiler pragmas, directives, function calls, and environment variables that explicitly instruct the compiler how and where to insert threads into the application. Most loops can be threaded by inserting only one pragma right before the loop. Further, by leaving the nitty-gritty details to the compiler and OpenMP, you can spend more time determining which loops should be threaded and how to best restructure the algorithms for maximum performance. The maximum performance of OpenMP is realized when it is used to thread "hotspots," the most time-consuming loops in your application.

The power and simplicity of OpenMP is best demonstrated by looking at an example. The following loop converts a 32-bit RGB (red, green, blue) pixel to an 8-bit gray-scale pixel. The one pragma, which has been inserted immediately before the loop, is all that is needed for parallel execution.

#pragma omp parallel for

for (i=0; i < numPixels; i++)

{

   pGrayScaleBitmap[i] = (unsigned BYTE)

            (pRGBBitmap[i].red * 0.299 +

             pRGBBitmap[i].green * 0.587 +

             pRGBBitmap[i].blue * 0.114);

}

Let's take a closer look at the loop. First, the example uses 'work-sha ring,' the general term used in OpenMP to describe distribution of work across threads. When work-sharing is used with the for construct, as shown in the example, the iterations of the loop are distributed among multiple threads so that each loop iteration is executed exactly once and in parallel by one or more threads. OpenMP determines how many threads to create and how to best create, synchronize, and destroy them. All the programmer needs to do is to tell OpenMP which loop should be threaded.

OpenMP places the following five restrictions on which loops can be threaded:

The loop variable must be of type signed integer. Unsigned integers, such as DWORD's, will not work.

The comparison operation must be in the form loop_variable <, <=, >, or >= loop_invariant_integer

The third expression or increment portion of the for loop must be either integer addition or integer subtraction and by a loop invariant value.

If the comparison operation is < or <=, the loop variable must increment on every iteration, and conversely, if the comparison operation is > or >=, the loop variable must decrement on every iteration.

The loop must be a basic block, meaning no jumps from the inside of the loop to the outside are permitted with the exception of the exit statement, which terminates the whole application. If the statements goto or break are used, they must jump within the loop, not outside it. The same goes for exception handling; exceptions must be caught within the loop.

Although these restrictions may sound somewhat limiting, non-conforming loops can easily be rewritten to follow these restrictions.

Getting Started with OpenMP的更多相关文章

  1. 应用OpenMP的一个简单的设计模式

    小喵的唠叨话:最近很久没写博客了,一是因为之前写的LSoftmax后馈一直没有成功,所以在等作者的源码.二是最近没什么想写的东西.前两天,在预处理图片的时候,发现处理200w张图片,跑了一晚上也才处理 ...

  2. openmp 的使用

    http://blog.csdn.net/gengshenghong/article/details/7003110 说明:这部分内容比较基础,主要是分析几个容易混淆的OpenMP函数,加以理解. ( ...

  3. OpenMP编程总结表

    本文对OpenMP 2.0的全部语法——Macro(宏定义).Environment Variables(环境变量).Data Types(数据类型).Compiler Directives(编译指导 ...

  4. OpenMP共享内存并行编程详解

    实验平台:win7, VS2010 1. 介绍 平行计算机可以简单分为共享内存和分布式内存,共享内存就是多个核心共享一个内存,目前的PC就是这类(不管是只有一个多核CPU还是可以插多个CPU,它们都有 ...

  5. OpenMP并行构造的schedule子句详解 (转载)

    原文:http://blog.csdn.net/gengshenghong/article/details/7000979 schedule的语法为: schedule(kind, [chunk_si ...

  6. 利用OpenMP实现埃拉托斯特尼(Eratosthenes)素数筛法并行化 分类: 算法与数据结构 2015-05-09 12:24 157人阅读 评论(0) 收藏

    1.算法简介 1.1筛法起源 筛法是一种简单检定素数的算法.据说是古希腊的埃拉托斯特尼(Eratosthenes,约公元前274-194年)发明的,又称埃拉托斯特尼筛法(sieve of Eratos ...

  7. 大数据并行计算利器之MPI/OpenMP

    大数据集群计算利器之MPI/OpenMP ---以连通域标记算法并行化为例 1 背景 图像连通域标记算法是从一幅栅格图像(通常为二值图像)中,将互相邻接(4邻接或8邻接)的具有非背景值的像素集合提取出 ...

  8. android studio ndk使用openMP

    好久没碰ndk了,之前都是在eclipse下写makefile配置c++程序的,现在发现主流都是用android studio,eclipse俨然已经被遗弃了,正好最近项目需要用openMP做算法加速 ...

  9. opencv 3.0 DPM Cascade 检测 (附带TBB和openMP加速)

    opencv 3.0 DPM cascade contrib模块 转载请注明出处,楼燚(yì)航的blog,http://www.cnblogs.com/louyihang-loves-baiyan/ ...

  10. openMP的一点使用经验【非原创】

    按照百科上说的,针对于openmp的编程,最简单的就是在开头加个#include<omp.h>,然后在后面的for上加一行#pragma omp parallel for即可,下面的是较为 ...

随机推荐

  1. 基于Apache POI 从xlsx读出数据

    [0]写在前面 0.1) these codes are from 基于Apache POI 的从xlsx读出数据 0.2) this idea is from http://cwind.iteye. ...

  2. MyBatis缓存介绍

    一.MyBatis缓存介绍 正如大多数持久层框架一样,MyBatis 相同提供了一级缓存和二级缓存的支持 一级缓存: 基于PerpetualCache 的 HashMap本地缓存.其存储作用域为 Se ...

  3. 怎样实现动态加入布局文件(避免 The specified child already has a parent的问题)

    首先扯点别的:我应经连续上了两个星期的班了,今天星期一.是第三个周.这个班上的也是没谁了.近期老是腰疼. 预计是累了.近期也没跑步.今天下班继续跑起. 这篇文章讲一讲怎样在一个布局文件里动态加在一个布 ...

  4. lnmp建站常识

    1.nginx配置网站目录并修改访问的端口:nginx.conf文件 listen 666;//端口默认为80,修改后增强安全性 server_name www.lnmp.org; index ind ...

  5. 【oracle案例】ORA-01102: cannot mount database in EXCLUSIVE mode

    ORA-01102: cannot mount database in EXCLUSIVE mode 今天在fedora上安装完10g后,测试数据库是否安装成功.STARTUP数据库时,发生如下错误: ...

  6. WCF基础之配置服务

    在WCF应用编程中配置服务是其主要部分. 配置可以定义和自定义如何向客户端公开服务,包括服务地址,发送和接受消息的传输和编码,以及服务的安全类型. 服务的配置有两种:编码和使用config文件,大多数 ...

  7. 九度OJ 1034:寻找大富翁 (排序)

    时间限制:1 秒 内存限制:32 兆 特殊判题:否 提交:5925 解决:2375 题目描述:     浙江桐乡乌镇共有n个人,请找出该镇上的前m个大富翁. 输入:     输入包含多组测试用例.   ...

  8. mysql系列之8.mysql高可用 (keepalived)

    环境: centos6.5_x64 准备: 两台mysql机器 主1 master:  192.168.32.130 主2 backup:  192.168.32.131 VIP: 192.168.3 ...

  9. 20170326 ABAP调用外部webservice 问题

    1.SE80 创建企业服务: 代理生成:出现错误 库处理程序中出现例外 错误的值:未知类型参考ns1:ArrayOfMLMatnrResource 尝试: 一.测试本地文件:---无效 1. 将网址链 ...

  10. <raspberry pi > 用树莓派来听落网电台

    树莓派放在抽屉里吃灰有半年多了,去年玩了1个月后就没怎么开整了,上个月没工作,刚好有点闲暇,就把树莓派翻出来折腾,刚好碰到落网改版了,想起以前在树莓派论坛看到有网友拿树莓派来听豆瓣电台,代码那时我都下 ...