0.1 Topic
Notes of Lin C., Snyder L.. Principles of Parallel Programming. Beijing: China Machine Press. 2008.

(1) Parallel Computer Architecture - done 2015/5/24
(2) Parallel Abstraction - done 2015/5/28
(3) Scable Algorithm Techniques - done 2015/5/30
(4) PP Languages: Java(Thread), MPI(local view), ZPL(global view)

0.2 Audience
Navie PP programmers who want to gain foundamental PP concepts

0.3 Related Topics
Computer Architecture, Sequential Algorithms,
PP Programming Languages

--------------------------------------------------------------------

  • ###1 introduction

real world cases:
house construction, manufacturing pipeline, call center

ILP(Instruction Level Parallelism)
(a+b) * (c+d)

Parallel Computing V.S. Distributed Computing
the goal of PC is to provide performance, either in terms of
processor power or memory that a single processor cannot provide;
the goal of DC is to provide convenience, including availability,
realiablity and physical distribution.

Concurrency V.S. Parallelism
CONCURRENCY is widely used in OS and DB communities to describe
exceutions that are LOGICALLY simultaneous;
PARALLELISM is typically used by the architecture and supercomputing
communities to describe executions that PHYSICALLY execute simultaneoulsy.
In either case, the codes that execute simultaneously exhibit unknown
timing characteristics.

iterative sum/pair-wise summation

parallel prefix sum

Parallelism using multiple instruction streams: thread
multithreaded solutions to count 3's number in an array

good parallel programs' characteristics:
(1) correct;
(2) good performance
(3) scalable to large number of processors
(4) portable across a wide variety to parallel platforms

  • ###2 parallel computers

6 parallel computers
(1) Chip multiprocessors *
Intel Core Duo
AMD Dual Core Opteron
(2) Symmetric Multiprocessor Architecture
Sun Fire E25K
(3) Heterogeneous Chip Design
Cell
(4) Clusters
(5) Supercomputers
BlueGene/L

sequential computer abstraction
Random Access Machine(RAM) model, i.e. the von Neumann Model
abstract a sequential computer as a device with an instruction
execution unit and an unbounded memory.

2 abstract models of parallel computers:
(1) PRAM: parallel random access machine model
the PRAM consists of an unspecified number of instruction execution units,
connected to a single unbounded shared memory that contains both
programs and data.
(2) CTA: candidate type architecture
the CTA consists of P standard sequential computers(processors,processor element),
connected by an interconnection network(communication network);
seperate 2 types of memory references: inexpensive local reference
and expensive non-local reference;

Locality Rule:
Fast programs tend to maximize the number of local memory references, and
minimize the number of non-local memory references.

3 major communication(memory reference) mechanisms:
(1) shared memory
a natural extension of the flat memory of sequential computers.
(2) one-sided communication
a relaxation of the shared memory concepts: support a single shared address space,
all threads can reference all memory location, but it doesn't attempt to keep the
memory coherent.
(3) message passing
memory references are used to access local memory,
message passing is userd to access non-local memory.

  • ### 3 reasoning about parallel performance

thread: thread-based/shared memory parallel programming
process: message passing/non-shread memory parallel proframming

latency: the amount of TIME it takes to complete a given unit of work
throughput: the amount of WORK that can be completed per unit time

## source of performance loss
(1) overhead
communication
synchronization
computation
memory
(2) non-parallelizable computation
Amdahl's Law: portions of a computation that are sequential will,
as parallelism is applied, dominate the execution time.
(3) idle processors
idle time is often a consequence of synchronization and communication
load imbalance: uneven distribution of work to processors
memory-bound computaion: bandwidth, lantency
(4) contention for resources
spin lock, false sharing

## parallel structure
(1) dependences
an ordering relationship between two computations
(2) granularity
the frequency of interactions among threads or processes
(3) locality
temporal locality: memory references are clustered in TIME
spatial locality: memory references are clustered by ADDRESS

## performance trade-off
sequential computation: 90/10 rule
communication V.S. Computation
Memory V.S. Parallelism
Overhead V.S. Parallelism

## measuring performance
(1) execution time/latency
(2) speedup/efficiency
(3) superliear speedup

## scable performance *
is difficult to achieve

  • ### 4 first step toward parallel programming

## data and task parallelism
(1) data parallel computation
parallelism is applied by performing the SAME operation to different items of data at the same time
(2) task parallel computation
parallelism is applied by performing DISTINCT computations/tasks at the same time

an example: the job of preparing a banquet/dinner

## Peril-L Notation
see handwrite notes

## formulating parallelism
(1) fixed parallelism
k processors, a k-way parallel algorithm
drawback: 2k processors cannot gain any imporvement
(2) unlimited parallelism
spawn a thread for each single data element:
// backgound: count 3's number in array[n]
int _count_ = 0;
forall (i in(0..n-1))//n is the arraysize
{
_count_ = +/(array[i]==3?1:0);
}
drawback: overhead of setup all threads is n/P,
where P is the number of processor, and P << n.

(3) scable parallelism
formulate a set of substantial subporblems, natural units of the solution are assigned to each subproblem, each subproblem is solved as independentyly as possible.
implications:
substantial: sufficent local work to cover parallel overheads
natural unit: computation is not always smoothly partitionable
independently: reduce parallel communication overheads

 

  • ### 5 scable alogrithmic techniques

focus on data parallel computations
# ideal parallel computation
composed of large blocks of independent computation with no interactions among blocks.
principle:
Parallel programs are more scable when they emphasize blocks of computation, typically
the larger the block the better, that minimize the inter-thread dependences.

## Schwartz's alogrithm
goal: +-reduce
condition: P is number of processors, n is number of values
2 approaches:
(1) use n/2 logicall concurrency - unlimited parallelism
(2) each process handle n/P items locally, then combine using P-leaf tree - better

notation: _total_ = +/ _data_;
where _total_ is a global number, _data_ is a global array
the compiler emit code that use Schwartz's local/global approach.

## reduce and scan abstractions
generalized reduce and scan functions

## assign work to processes statically

## assign work to processes dynamically

## trees

Notes of Principles of Parallel Programming - TODO的更多相关文章

  1. Notes of Principles of Parallel Programming: Peril-L Notation - TODO

    Content 1 syntax and semantic 2 example set 1 syntax and semantic 1.1 extending C Peril-L notation s ...

  2. Introduction to Multi-Threaded, Multi-Core and Parallel Programming concepts

    https://katyscode.wordpress.com/2013/05/17/introduction-to-multi-threaded-multi-core-and-parallel-pr ...

  3. 4.3 Reduction代码(Heterogeneous Parallel Programming class lab)

    首先添加上Heterogeneous Parallel Programming class 中 lab: Reduction的代码: myReduction.c // MP Reduction // ...

  4. Task Cancellation: Parallel Programming

    http://beyondrelational.com/modules/2/blogs/79/posts/11524/task-cancellation-parallel-programming-ii ...

  5. Samples for Parallel Programming with the .NET Framework

    The .NET Framework 4 includes significant advancements for developers writing parallel and concurren ...

  6. Parallel Programming for FPGAs 学习笔记(1)

    Parallel Programming for FPGAs 学习笔记(1)

  7. Parallel Programming AND Asynchronous Programming

    https://blogs.oracle.com/dave/ Java Memory Model...and the pragmatics of itAleksey Shipilevaleksey.s ...

  8. 【转载】#229 - The Core Principles of Object-Oriented Programming

    As an object-oriented language, c# supports the three core principles of object-oriented programming ...

  9. Fork and Join: Java Can Excel at Painless Parallel Programming Too!---转

    原文地址:http://www.oracle.com/technetwork/articles/java/fork-join-422606.html Multicore processors are ...

随机推荐

  1. IE9的css hack

    以前写过<IE8的css hack>,ie9一出css hack也该更新,以前一直没关注,今天在内部参考群mxclion分享了IE9的css hack,拿出来也分享一下: select { ...

  2. BYOA,也许是IT大叔眼中的新生代萝莉

    BYOx已经成为一种潮流,仿佛美瞳.紧身legging和大红唇在某个阶段也是姑娘们的标配,这个小小的x可以替换成任何东西,带着你的“玫瑰”.带着你的“嫁妆”.带着你的“炸鸡和啤酒”……很抱歉,今天的故 ...

  3. 启动BPM的5个步骤

    在大部分业务中,我们通常认为:一个主要的业务流程管理项目从设计时间开始会比较好.我们知道很多方式来提高效率,增加生产力以及简化我们员工的工 作 - 这正是业务流程管理所做的.不幸的是,不管我们意图多好 ...

  4. SharePoint 2013 Nintex Workflow 工作流帮助(一)

    博客地址 http://blog.csdn.net/foxdave 接下来一段时间的内容中,我们基于SharePoint 2013来了解一下Nintex Workflow的具体内容. 之前的几篇由于之 ...

  5. iOS多线程之NSThread使用

    iOS中的多线程技术 我们在iOS开发项目过程中,为了解决UI界面操作不被耗时操作阻塞,我们会使用到多线程技术.在iOS开发中,我们主要会用到三种多线程操作技术:NSThread,NSOperatio ...

  6. swift语言开发的一个游戏------熊猫跑酷(KongfuPanda)

    项目地址:https://github.com/jakciehoo/KongfuPanda 欢迎加QQ群:260558552.大家一起交流iOS开发,我们可以一起学习,我很想集结一些志同道合的朋友,一 ...

  7. ODI 12c中使用Groovy脚本创建工程

    本文主要介绍在ODI中使用groovy创建工程,并添加一个表转换的映射.要创建groovy脚本,可以从ODI Studio的菜单:工具->Groovy->新脚本 打开一个编辑窗口.在执行下 ...

  8. history对象

    1.history对象前进 history.forward() 2.history对象后退 history.back() 3.history对象跳入指定页面 history.go(-1):  //当前 ...

  9. zoj1610 线段树

    //Accepted 804 KB 40 ms //整个题,都是坑 //1.The first line of each data set contains exactly one integer n ...

  10. 深入理解SELinux

      目录(?)[+]   1. 简介 SELinux带给Linux的主要价值是:提供了一个灵活的,可配置的MAC机制. Security-Enhanced Linux (SELinux)由以下两部分组 ...