General mistakes in parallel computing
这是2013年写的一篇旧文,放在gegahost.net上面 http://raison.gegahost.net/?p=97
March 11, 2013
General mistakes in parallel computing
(Original Work by Peixu Zhu)
In parallel computing environment, some general mistakes are frequent and difficult to shoot, caused by random CPU sequence in different thread contexts. Most of them are atomic violation, order violation, and dead lock. Studies show that some famous software also have such mistakes, like MySQL, Apache, Mozilla, and OpenOffice.
1. Atomic violation
In sequent programming, we seldom care the atomic operation, however, in parallel programming, we must remember atomic operations at first. for example:
[Thread 1]
if (_ptr) // A
*_ptr = 0; // B
[Thread 2]
_ptr = NULL; // C
For above code, there’s one statement to be executed in thread 1 and
thread 2 respectively, it seems that it should be running the statement
in thread 1 or thread 2, they should not be interlaced. But, in fact,
statement in thread 1 is not atomic, at least, it can
be divided into step A and B, thus, if it is arranged to execute in
order of A-B-C, it is okay, however, it is also possible be scheduled to
run as A-C-B, this will bring an unexpected memory access error.
We assume that the statement region in thread 1 is atomic, but it is
not true. This is the root of the atomic violation. In many cases, the
problem is caused by code modification, for above example, the statement
in thread 1 may be a simple assignment statement at first:
_ptr = &_val;
And later, the code is modified, and the implicit atomicity is broken.
For systems with multiple cores, the problem will be more
complicated, since each core may cache a block of memory respectively.
For example, core 1 runs thread 1, and core 2 runs thread 2:
[Thread 1]
_ptr = &_val;
[Thread 2]
_ptr = NULL;
Are they atomic ? No, they are not in fact. the `_ptr` may be
optimized to be register value in one core locally, or it is cached in
different core. Thus, the we can not determine the value of `_ptr`.
To avoid atomic violation, we must make the code region atomic, by
locking or atomic operations. Explicit atomic operations on a shared
variable is a good habit, since we are noticed by the statement that it
is atomicity demanded when we try to modify the code.
2. Order violation
Considering below example:
[Thread 1]
_ptr = allocate_memory(); // A
[Thread 2]
_ptr[1] = "right"; // B
If the code is not synchronized, execution order of A-B or B-A are
all possible. In such cases, we must synchronize the code block to
ensure the order of execution.
3. Dead lock
Locking is elemental in concurrent programming. If there’s more than
one threads working with more than with one shared resource, such as
memory block, it is possible that each thread owning a resource is
waiting for each others resource.
[Thread 1]
lock_a.lock();
a = 0; // A
lock_b.lock();
b = 0; // B
lock_b.unlock();
lock_a.unlock();
[Thread 2]
lock_b.lock();
b = 1; // C
lock_a.lock();
a = 1; // D
lock_a.unlock();
lock_b.unlock();
if the code is running as A-B-C-D, there’s no problem, however, if it
is running as A-C-B-D, there’s dead lock. Dead locking requires four
conditions:
a. mutex exclusion
b. hold and wait
c. no preemption
d. circular waiting
Breaking at least one of above four condition will break the dead locking.
General mistakes in parallel computing的更多相关文章
- Introduction to Parallel Computing
Copied From:https://computing.llnl.gov/tutorials/parallel_comp/ Author: Blaise Barney, Lawrence Live ...
- Method and apparatus for an atomic operation in a parallel computing environment
A method and apparatus for a atomic operation is described. A method comprises receiving a first pro ...
- PatentTips - Safe general purpose virtual machine computing system
BACKGROUND OF THE INVENTION The present invention relates to virtual machine implementations, and in ...
- STROME --realtime & online parallel computing
Data Collections ---> Stream to Channel (as source input) ----> Parallel Computing---> Resu ...
- Parallel Computing–Cannon算法 (MPI 实现)
原理不解释,直接上代码 代码中被注释的源程序可用于打印中间结果,检查运算是否正确. #include "mpi.h" #include <math.h> #includ ...
- Distributed and Parallel Computing
Omega Network Model
- How-to go parallel in R – basics + tips(转)
Today is a good day to start parallelizing your code. I’ve been using the parallel package since its ...
- Parallel Gradient Boosting Decision Trees
本文转载自:链接 Highlights Three different methods for parallel gradient boosting decision trees. My algori ...
- Massively parallel supercomputer
A novel massively parallel supercomputer of hundreds of teraOPS-scale includes node architectures ba ...
随机推荐
- 格式化磁盘,提示 is apparently in use by the system
一般是被mdadm占用着. 使用 cat /proc/mdstat 查看所有 找到欲格式化盘符号. 使用 mdadm --stop /dev/md0 mdadm --remove /dev/md0 ...
- codeforces 696C C. PLEASE(概率+快速幂)
题目链接: C. PLEASE time limit per test 1 second memory limit per test 256 megabytes input standard inpu ...
- Android隐藏Activity和图标
今天发现4.0以后如果不写Activity只写BroadcastReceiver的话,这个广播接收器是不能运行的.经过查询,好像是HoneyComb之后添加了安全机制,规定必须运行一次Activity ...
- POJ1113:Wall (凸包:求最小的多边形,到所有点的距离大于大于L)
Once upon a time there was a greedy King who ordered his chief Architect to build a wall around the ...
- 「HAOI2015」「LuoguP3178」树上操作(树链剖分
题目描述 有一棵点数为 N 的树,以点 1 为根,且树点有边权.然后有 M 个操作,分为三种: 操作 1 :把某个节点 x 的点权增加 a . 操作 2 :把某个节点 x 为根的子树中所有点的点权都增 ...
- 【NOIP2012】 国王游戏
[题目链接] 点击打开链接 [算法] 按ai * bi升序排序,贪心即可 [代码] #include<bits/stdc++.h> using namespace std; #define ...
- idea类名下有红色波浪线
能编译通过说明SDK导入正确,但是为啥我们点击每一个Java文件会出现好多红色的下划线 ,并提示idea cant resolve symbol 原因就是可能没有清除原来的历史缓存,导致一些错误,解决 ...
- javascript 布尔类型
true和false 表达式为false的情况 1 false 2 NaN 3 0 4 字符串的空 " " 5 null 6 undefined
- iOS 拼音 Swift K3Pinyin
iOS 系统方法支持直接获取拼音,避免了之前各种第三方引入各种MAP或者资源文件.下面是一个Swift版本的简单示例: // swift 4.0 func pinyin(_ string: Strin ...
- 基于FBX SDK的FBX模型解析与加载 -(一)
http://blog.csdn.net/bugrunner/article/details/7210511 1. 简介 FBX是Autodesk的一个用于跨平台的免费三维数据交换的格式(最早不是由A ...