HC8 – Hiperism

Course HC8: Training for Hybrid MPI and OpenMP Parallel Computing

Prerequisites:

This course is intended for programmers who are familiar with programming for serial or vector architectures in Fortran or C. It is also intended for programmers who have production code that needs to be moved to a distributed memory parallel (DMP) implementation that has shared memory parallel (SMP) nodes. No prior knowledge of programming (non-vector) parallel computers is assumed. However, some experience with programming for vector Supercomputers is an advantage.

Objectives:

This training course primarily intends to introduce both OpenMP and MPI to programmers who have no prior experience in parallel computing. As a secondary objective the target audience also includes those with a background in vector or serial processing systems with a requirement to port code to a hybrid OpenMP and Message Passing Interface (MPI) model. The course teaches participants how to write parallel code using a combination of OpenMP and MPI programming paradigms and conversion of constructs into equivalent MPI+OpenMP form. The implementations include DMP and SMP platforms from Beowulf clusters to large high performance computers. Special attention is devoted to issues related to porting legacy code to hybrid SMP and DMP implementations.

Duration:

4 days organized as follows:

(HC2 and HC6 refer to training work books for the respective courses: these are required as references):

Day	Period	Chapter	Topic
1	AM	HC8_1 HC8_2 HC2_4 HC6_2 HC6_3	Porting legacy code to parallel computers MPI+OpenMP parallelization strategies The OpenMP paradigm of parallel programming MPI: Environment MPI: Point-to-point communication
1	PM	HC2_5 HC2_6	OpenMP language specification Examples comparing serial and OpenMP parallel code
2	AM	HC6_4 HC2_6	MPI: Collective communication Examples comparing serial and MPI parallel code
2	PM	HC6_5 HC6_7	MPI: Derived data types MPI: Topology and groups
3	AM	HC8_3	Case study of the dot product and matrix-vector product in serial, MPI, OpenMP, and MPI+OpenMP versions
3	PM	HC8_4	Case study of the Stommel Ocean Model in (1) OpenMP, (2) one- and two- dimensional domain decompositions for MPI, and (3) conversion to MPI+OpenMP.
4	AM	HC8_5 HC8_6	Quick start with Intel Trace Analyzer™ and Intel Trace Collector™ for MPI Quick start with Intel Thread Checker™
4	PM		Case study of the Princeton Ocean Model in (1) a two- dimensional domain decomposition for MPI, and (2) conversion to MPI+OpenMP.

Format:

The course is contained in two course workbooks and a volume of case studies intended for use in one of three ways:

Class room presentation,
Self-paced study,
As a reference.

For options (1) and (2) this course is accompanied by a syllabus. The workbooks for courses HC2 and HC6 are required as reference material and a summary of contents is given in HC2 and HC6. Additional material will be provided for case studies covered in this course.

The workbooks include all source code, sample input, output, and make files needed to compile and execute all programs and examples discussed in the text.

Review of Sections:

This training workbook is arranged into six chapters described as follows.

1. Porting Legacy Code to Parallel Computers. This chapter reviews developer perceptions of parallel programming, considerations for legacy codes and how to look for parallelism in them. Also covered are guidelines for porting to DMP, SMP, and DMP+SMP computers, typical parallel performance problems and some lessons learned.

1. MPI+OpenMP Parallelization Strategies. This chapter starts from the basics by describing DMP and SMP architectures. Then issues related to merging MPI and OpenMP are outlined. In conclusion, some performance issues and parallel granularity are discussed.

1. Small Examples and Exercises. This chapter presents examples for the dot product and the matrix vector product in serial, MPI, OpenMP, and MPI+OpenMP code. The discussion sections explain the codes and propose simple exercises.

1. Case Study of SOM77. This chapter is a case study of the Stommel Ocean Model in fortran 77. Source code for serial, MPI, OpenMP, and MPI+OpenMP versions is presented. For the MPI code versions both the 1-D and 2-D MPI decompositions are presented. The discussion sections explain the source code and propose exercises.

1. Quick Start with Intel Trace Analyzer™ and Intel Trace Collector™ for MPI. This chapter covers the usage and form of these MPI tools for the Examples and Case studies of Chapters 3 and 4. The basics of producing MPI trace files is presented. The navigation through various graphical views of traced information reveals differences in performance for MPI versions of these codes.

Quick Start with Intel Thread Checker™ for OpenMP. This chapter covers the usage and form of these OpenMP tools for the Examples and Case studies of Chapters 3 and 4. The basics of producing the required files is presented. The navigation through various graphical views of traced information reveals differences in performance for OpenMP versions of these codes.