Prerequisites:
This course is intended
for programmers who are familiar with programming for serial or vector architectures in
Fortran or C. It is also intended for programmers who have production code that needs to
be moved to a distributed memory parallel (DMP) implementation that has shared memory
parallel (SMP) nodes. No prior knowledge of programming (non-vector) parallel computers is
assumed. However, some experience with programming for vector Supercomputers is an
advantage.
Objectives:
This training
course primarily intends to introduce both OpenMP and MPI to programmers who have no prior
experience in parallel computing. As a secondary objective the target audience also
includes those with a background in vector or serial processing systems with a requirement
to port code to a hybrid OpenMP and Message Passing Interface (MPI) model. The course
teaches participants how to write parallel code using a combination of OpenMP and MPI
programming paradigms and conversion of constructs into equivalent MPI+OpenMP form. The
implementations include DMP and SMP platforms from Beowulf clusters to large high
performance computers. Special attention is devoted to issues related to porting legacy
code to hybrid SMP and DMP implementations.
Duration:
4
days organized as follows:
(HC2 and HC6 refer to training work books for the
respective courses: these are required as references):
Day |
Period |
Chapter |
Topic |
1 |
AM |
HC8_1
HC8_2
HC2_4
HC6_2
HC6_3 |
Porting legacy code
to parallel computers MPI+OpenMP parallelization
strategies
The OpenMP paradigm of parallel programming
MPI: Environment
MPI: Point-to-point communication |
1 |
PM |
HC2_5
HC2_6 |
OpenMP language specification Examples
comparing serial and OpenMP parallel code |
2 |
AM |
HC6_4
HC2_6 |
MPI: Collective communication Examples
comparing serial and MPI parallel code |
2 |
PM |
HC6_5
HC6_7 |
MPI: Derived data types MPI:
Topology and groups |
3 |
AM |
HC8_3 |
Case study of the dot product and
matrix-vector product in serial, MPI, OpenMP, and MPI+OpenMP versions |
3 |
PM |
HC8_4 |
Case study of the Stommel Ocean
Model in (1) OpenMP, (2) one- and two- dimensional domain decompositions for MPI,
and (3) conversion to MPI+OpenMP. |
4 |
AM |
HC8_5
HC8_6 |
Quick start with
Intel Trace Analyzer™ and Intel Trace Collector™ for MPI Quick start
with Intel Thread Checker™ |
4 |
PM |
|
Case study of the
Princeton Ocean Model in (1) a two- dimensional domain decomposition for MPI, and (2)
conversion to MPI+OpenMP. |
|
Format:
The course is contained
in two course workbooks and a volume of case studies intended for use in one of three
ways:
- Class room presentation,
- Self-paced study,
- As a reference.
For options (a) and (b) this course is accompanied by a
syllabus. The workbooks for courses HC2 and HC6 are required as reference material and a
summary of contents is given in HC2 and HC6.
Additional material will be provided for case studies covered in this course.
The workbooks include all source code, sample input,
output, and make files needed to compile and execute all programs and examples discussed
in the text.
Review of Sections:
This training workbook is arranged into six chapters
described as follows.
- Porting Legacy Code to Parallel Computers
. This chapter
reviews developer perceptions of parallel programming, considerations for legacy codes and
how to look for parallelism in them. Also covered are guidelines for porting to DMP, SMP,
and DMP+SMP computers, typical parallel performance problems and some lessons learned.
- MPI+OpenMP Parallelization Strategies
. This chapter
starts from the basics by describing DMP and SMP architectures. Then issues related to
merging MPI and OpenMP are outlined. In conclusion, some performance issues and parallel
granularity are discussed.
- Small Examples and Exercises
. This chapter presents
examples for the dot product and the matrix vector product in serial, MPI, OpenMP, and
MPI+OpenMP code. The discussion sections explain the codes and propose simple exercises.
- Case Study of SOM77
. This chapter is a case study of the
Stommel Ocean Model in fortran 77. Source code for serial, MPI, OpenMP, and MPI+OpenMP
versions is presented. For the MPI code versions both the 1-D and 2-D MPI decompositions
are presented. The discussion sections explain the source code and propose exercises.
- Quick Start with Intel Trace Analyzer™ and
Intel Trace Collector™ for MPI
. This
chapter covers the usage and form of these MPI tools for the Examples and Case studies of
Chapters 3 and 4. The basics of producing MPI trace files is presented. The navigation
through various graphical views of traced information reveals differences in performance
for MPI versions of these codes.
- Quick Start with Intel Thread Checker™ for OpenMP
. This
chapter covers the usage and form of these OpenMP tools for the Examples and Case studies
of Chapters 3 and 4. The basics of producing the required files is presented. The
navigation through various graphical views of traced information reveals differences in
performance for OpenMP versions of these codes.
HiPERiSM Consulting, LLC, (919) 484-9803
(Voice)
(919) 806-2813 (Facsimile) |