|
||
Author: Blaise Barney, Lawrence Livermore National Laboratory | UCRL-MI-133316 |
Abstract |
This tutorial is the first of eight tutorials in the 4+ day "Using LLNL's Supercomputers" workshop. It is intended to provide only a very quick overview of the extensive and broad topic of Parallel Computing, as a lead-in for the tutorials that follow it. As such, it covers just the very basics of parallel computing, and is intended for someone who is just becoming acquainted with the subject and who is planning to attend one or more of the other tutorials in this workshop. It is not intended to cover Parallel Programming in depth, as this would require significantly more time. The tutorial begins with a discussion on parallel computing - what it is and how it's used, followed by a discussion on concepts and terminology associated with parallel computing. The topics of parallel memory architectures and programming models are then explored. These topics are followed by a series of practical discussions on a number of the complex issues related to designing and running parallel programs. The tutorial concludes with several examples of how to parallelize simple serial programs.
Overview |
For example:
Parallel Computing:
For example:
Parallel Computers:
![]() IBM BG/Q Compute Chip with 18 cores (PU) and 16 L2 Cache units (L2) |
Overview |
|
|
Overview |
![]()
|
Click on images below for larger version | |
![]() |
![]() |
Concepts and Terminology |
![]() |
|
![]() John von Neumann circa 1940s (Source: LANL archives) |
Concepts and Terminology |
S I S DSingle Instruction Stream |
S I M DSingle Instruction Stream |
M I S DMultiple Instruction Stream |
M I M DMultiple Instruction Stream |
![]()
|
![]() |
![]() UNIVAC1 |
![]() IBM 360 |
![]() CRAY1 |
![]() CDC 7600 |
![]() PDP1 |
![]() Dell Laptop |
![]()
|
![]() |
![]() ILLIAC IV |
![]() MasPar |
![]() |
![]() Cray X-MP |
![]() Cray Y-MP |
![]() Thinking Machines CM-2 |
![]() Cell Processor (GPU) |
![]()
|
![]() |
![]() |
![]()
|
![]() |
![]() IBM POWER5 |
![]() HP/Compaq Alphaserver |
![]() Intel IA32 |
![]() AMD Opteron |
![]() Cray XT3 |
![]() IBM BG/L |
Concepts and Terminology |
Synchronization usually involves waiting by at least one task, and can therefore cause a parallel application's wall clock execution time to increase.
wall-clock time of serial execution ----------------------------------- wall-clock time of parallel execution |
One of the simplest and most widely used indicators for a parallel program's performance.
Concepts and Terminology |
|
![]() ![]() |
2D Grid Calculations 85 seconds 85% Serial fraction 15 seconds 15%We can increase the problem size by doubling the grid dimensions and halving the time step. This results in four times the number of grid points and twice the number of time steps. The timings then look like:
2D Grid Calculations 680 seconds 97.84% Serial fraction 15 seconds 2.16%
Complexity:
Portability:
Resource Requirements:
Scalability:
Parallel Computer Memory Architectures |
|
![]() Shared Memory (UMA) ![]() Shared Memory (NUMA) |
Disadvantages:
Parallel Computer Memory Architectures |
Advantages:
Disadvantages:
Parallel Computer Memory Architectures |
![]() |
![]() |
Advantages and Disadvantages:
Parallel Programming Models |
|
![]() |
|
![]() |
Parallel Programming Models |
Implementations:
Parallel Programming Models |
Implementations:
In both cases, the programmer is responsible for determining the parallelism (although compilers can sometimes help).
More Information:
Parallel Programming Models |
Implementations:
More Information:
Parallel Programming Models |
Implementations:
Parallel Programming Models |
Parallel Programming Models |
Multiple Program Multiple Data (MPMD):
Designing Parallel Programs |
Designing Parallel Programs |
Calculate the potential energy for each of several thousand independent conformations of a molecule. When done, find the minimum energy conformation. |
This problem is able to be solved in parallel. Each of the molecular conformations is independently determinable. The calculation of the minimum energy conformation is also a parallelizable problem.
Calculation of the Fibonacci series (0,1,1,2,3,5,8,13,21,...) by use of
the formula:
F(n) = F(n-1) + F(n-2) |
This is a non-parallelizable problem because the calculation of the Fibonacci sequence as shown would entail dependent calculations rather than independent ones. The calculation of the F(n) value uses those of both F(n-1) and F(n-2). These three terms cannot be calculated independently and therefore, not in parallel.
|
![]() |
Designing Parallel Programs |
Domain Decomposition:
Functional Decomposition:
Designing Parallel Programs |
The need for communications between tasks depends upon your problem:
Factors to Consider:
There are a number of important factors to consider when designing your program's inter-task communications:
Designing Parallel Programs |
Designing Parallel Programs |
Examples:
DO 500 J = MYSTART,MYEND A(J) = A(J-1) * 2.0 500 CONTINUE |
The value of A(J-1) must be computed before the value of A(J), therefore A(J) exhibits a data dependency on A(J-1). Parallelism is inhibited.
If Task 2 has A(J) and task 1 has A(J-1), computing the correct value of A(J) necessitates:
task 1 task 2 ------ ------ X = 2 X = 4 . . . . Y = X**2 Y = X**3 |
As with the previous example, parallelism is inhibited. The value of Y is dependent on:
How to Handle Data Dependencies:
Designing Parallel Programs |
Designing Parallel Programs |
![]()
|
![]() |
Designing Parallel Programs |
The Good News:
Designing Parallel Programs |
Designing Parallel Programs |
Parallel Examples |
|
![]() |
|
![]() |
One Possible Solution:
find out if I am MASTER or WORKER if I am MASTER initialize the array send each WORKER info on part of array it owns send each WORKER its portion of initial array receive from each WORKER results else if I am WORKER receive from MASTER info on part of array I own receive from MASTER my portion of initial array # calculate my portion of array do j = my first column,my last column do i = 1,n a(i,j) = fcn(i,j) end do end do send MASTER results endif |
Pool of Tasks Scheme:
Master Process:
Worker Process: repeatedly does the following
find out if I am MASTER or WORKER if I am MASTER do until no more jobs if request send to WORKER next job else receive results from WORKER end do else if I am WORKER do until no more jobs request job from MASTER receive from MASTER next job calculate array element: a(i,j) = fcn(i,j) send results to MASTER end do endif |
Discussion:
Parallel Examples |
|
![]() |
|
![]() |
Parallel Examples |
|
find out if I am MASTER or WORKER if I am MASTER initialize array send each WORKER starting info and subarray receive results from each WORKER else if I am WORKER receive from MASTER starting info and subarray do t = 1, nsteps update time send neighbors my border info receive from neighbors their border info update my portion of solution array end do send MASTER results endif |
Parallel Examples |
where c is a constantA(i,t+1) = (2.0 * A(i,t)) - A(i,t-1) + (c * (A(i-1,t) - (2.0 * A(i,t)) + A(i+1,t)))
find out number of tasks and task identities #Identify left and right neighbors left_neighbor = mytaskid - 1 right_neighbor = mytaskid +1 if mytaskid = first then left_neigbor = last if mytaskid = last then right_neighbor = first find out if I am MASTER or WORKER if I am MASTER initialize array send each WORKER starting info and subarray else if I am WORKER` receive starting info and subarray from MASTER endif #Update values for each point along string #In this example the master participates in calculations do t = 1, nsteps send left endpoint to left neighbor receive left endpoint from right neighbor send right endpoint to right neighbor receive right endpoint from left neighbor #Update points along line do i = 1, npoints newval(i) = (2.0 * values(i)) - oldval(i) + (sqtau * (values(i-1) - (2.0 * values(i)) + values(i+1))) end do end do #Collect results and write to file if I am MASTER receive results from each WORKER write results to file else if I am WORKER send results to MASTER endif |
This completes the tutorial.
![]() |
Please complete the online evaluation form. |
Where would you like to go now?
References and More Information |