Prerequisites:
This course is intended for programmers who are familiar with programming for serial or vector architectures in Fortran or C. It is also intended for programmers who have production code that needs to be moved to a distributed memory parallel (DMP) implementation that has shared memory parallel (SMP) nodes. No prior knowledge of programming (non-vector) parallel computers is assumed. However, some experience with programming for vector Supercomputers is an advantage.
Objectives:
This training course primarily intends to introduce both OpenMP and MPI to programmers who have no prior experience in parallel computing. As a secondary objective the target audience also includes those with a background in vector or serial processing systems with a requirement to port code to a hybrid OpenMP and Message Passing Interface (MPI) model. The course teaches participants how to write parallel code using a combination of OpenMP and MPI programming paradigms and conversion of constructs into equivalent MPI+OpenMP form. The implementations include DMP and SMP platforms from Beowulf clusters to large high performance computers. Special attention is devoted to issues related to porting legacy code to hybrid SMP and DMP implementations.
Duration:
4 days organized as follows:
(HC2 and HC6 refer to training work books for the respective courses: these are required as references):
Day |
Period |
Chapter |
Topic |
1
|
AM |
HC8_1
HC8_2
HC2_4
HC6_2
HC6_3
|
Porting legacy code to parallel computers
MPI+OpenMP parallelization strategies
The OpenMP paradigm of parallel programming
MPI: Environment
MPI: Point-to-point communication
|
1
|
PM |
HC2_5
HC2_6
|
OpenMP language specification
Examples comparing serial and OpenMP parallel code
|
2
|
AM |
HC6_4
HC2_6
|
MPI: Collective communication
Examples comparing serial and MPI parallel code
|
2
|
PM |
HC6_5
HC6_7
|
MPI: Derived data types
MPI: Topology and groups
|
3
|
AM |
HC8_3
|
Case study of the dot product and matrix-vector product in serial, MPI, OpenMP, and MPI+OpenMP versions |
3
|
PM |
HC8_4
|
Case study of the Stommel Ocean Model in (1) OpenMP, (2) one- and two- dimensional domain decompositions for MPI, and (3) conversion to MPI+OpenMP. |
4
|
AM |
HC8_5
HC8_6
|
Quick start with Intel Trace Analyzer™ and Intel Trace Collector™ for MPI
Quick start with Intel Thread Checker™
|
4
|
PM |
|
Case study of the Princeton Ocean Model in (1) a two- dimensional domain decomposition for MPI, and (2) conversion to MPI+OpenMP. |
|
Format:
The course is contained in two course workbooks and a volume of case studies intended for use in one of three ways:
- Class room presentation,
- Self-paced study,
- As a reference.
For options (1) and (2) this course is accompanied by a syllabus. The workbooks for courses HC2 and HC6 are required as reference material and a summary of contents is given in HC2 and HC6. Additional material will be provided for case studies covered in this course.
The workbooks include all source code, sample input, output, and make files needed to compile and execute all programs and examples discussed in the text.
Review of Sections:
This training workbook is arranged into six chapters described as follows.
-
- Porting Legacy Code to Parallel Computers. This chapter reviews developer perceptions of parallel programming, considerations for legacy codes and how to look for parallelism in them. Also covered are guidelines for porting to DMP, SMP, and DMP+SMP computers, typical parallel performance problems and some lessons learned.
-
- MPI+OpenMP Parallelization Strategies. This chapter starts from the basics by describing DMP and SMP architectures. Then issues related to merging MPI and OpenMP are outlined. In conclusion, some performance issues and parallel granularity are discussed.
-
- Small Examples and Exercises. This chapter presents examples for the dot product and the matrix vector product in serial, MPI, OpenMP, and MPI+OpenMP code. The discussion sections explain the codes and propose simple exercises.
-
- Case Study of SOM77. This chapter is a case study of the Stommel Ocean Model in fortran 77. Source code for serial, MPI, OpenMP, and MPI+OpenMP versions is presented. For the MPI code versions both the 1-D and 2-D MPI decompositions are presented. The discussion sections explain the source code and propose exercises.
-
- Quick Start with Intel Trace Analyzer™ and Intel Trace Collector™ for MPI. This chapter covers the usage and form of these MPI tools for the Examples and Case studies of Chapters 3 and 4. The basics of producing MPI trace files is presented. The navigation through various graphical views of traced information reveals differences in performance for MPI versions of these codes.
- Quick Start with Intel Thread Checker™ for OpenMP. This chapter covers the usage and form of these OpenMP tools for the Examples and Case studies of Chapters 3 and 4. The basics of producing the required files is presented. The navigation through various graphical views of traced information reveals differences in performance for OpenMP versions of these codes.
|