HiPERiSM's Course HC8 

HiPERiSM - High Performance Algorism Consulting

Course HC8: Training for Hybrid MPI and OpenMP Parallel Computing

image2.gif (13629 bytes) MPILOGOGREEN.GIF (3269 bytes)



This course is intended for programmers who are familiar with programming for serial or vector architectures in Fortran or C. It is also intended for programmers who have production code that needs to be moved to a distributed memory parallel (DMP) implementation that has shared memory parallel (SMP) nodes. No prior knowledge of programming (non-vector) parallel computers is assumed. However, some experience with programming for vector Supercomputers is an advantage.


This training course primarily intends to introduce both OpenMP and MPI to programmers who have no prior experience in parallel computing. As a secondary objective the target audience also includes those with a background in vector or serial processing systems with a requirement to port code to a hybrid OpenMP and Message Passing Interface (MPI) model. The course teaches participants how to write parallel code using a combination of OpenMP and MPI programming paradigms and conversion of constructs into equivalent MPI+OpenMP form. The implementations include DMP and SMP platforms from Beowulf clusters to large high performance computers. Special attention is devoted to issues related to porting legacy code to hybrid SMP and DMP implementations.


4 days organized as follows:

(HC2 and HC6 refer to training work books for the respective courses: these are required as references):

Day Period Chapter Topic








Porting legacy code to parallel computers

MPI+OpenMP parallelization strategies

The OpenMP paradigm of parallel programming

MPI: Environment

MPI: Point-to-point communication





OpenMP language specification

Examples comparing serial and OpenMP parallel code





MPI: Collective communication

Examples comparing serial and MPI parallel code





MPI: Derived data types

MPI: Topology and groups




Case study of the dot product and matrix-vector product in serial, MPI, OpenMP, and MPI+OpenMP versions




Case study of the Stommel Ocean Model in (1) OpenMP,  (2) one- and two- dimensional domain decompositions for MPI, and (3) conversion to MPI+OpenMP.





Quick start with Intel Trace Analyzer™ and Intel Trace Collector™ for MPI

Quick start with Intel Thread Checker


PM Case study of the Princeton Ocean Model in (1) a two- dimensional domain decomposition for MPI, and (2) conversion to MPI+OpenMP.


The course is contained in two course workbooks and a volume of case studies intended for use in one of three ways:

  1. Class room presentation,
  2. Self-paced study,
  3. As a reference.

For options (a) and (b) this course is accompanied by a syllabus. The workbooks for courses HC2 and HC6 are required as reference material and a summary of contents is given in HC2 and HC6. Additional material will be provided for case studies covered in this course.

The workbooks include all source code, sample input, output, and make files needed to compile and execute all programs and examples discussed in the text.

Review of Sections:

This training workbook is arranged into six chapters described as follows.

  1. Porting Legacy Code to Parallel Computers. This chapter reviews developer perceptions of parallel programming, considerations for legacy codes and how to look for parallelism in them. Also covered are guidelines for porting to DMP, SMP, and DMP+SMP computers, typical parallel performance problems and some lessons learned.
  2. MPI+OpenMP Parallelization Strategies. This chapter starts from the basics by describing DMP and SMP architectures. Then issues related to merging MPI and OpenMP are outlined. In conclusion, some performance issues and parallel granularity are discussed.
  3. Small Examples and Exercises. This chapter presents examples for the dot product and the matrix vector product in serial, MPI, OpenMP, and MPI+OpenMP code. The discussion sections explain the codes and propose simple exercises.
  4. Case Study of SOM77. This chapter is a case study of the Stommel Ocean Model in fortran 77. Source code for serial, MPI, OpenMP, and MPI+OpenMP versions is presented. For the MPI code versions both the 1-D and 2-D MPI decompositions are presented. The discussion sections explain the source code and propose exercises.
  5. Quick Start with Intel Trace Analyzer™ and Intel Trace Collector™ for MPI. This chapter covers the usage and form of these MPI tools for the Examples and Case studies of Chapters 3 and 4. The basics of producing MPI trace files is presented. The navigation through various graphical views of traced information reveals differences in performance for MPI versions of these codes.
  6. Quick Start with Intel Thread Checker™ for OpenMP. This chapter covers the usage and form of these OpenMP tools for the Examples and Case studies of Chapters 3 and 4. The basics of producing the required files is presented. The navigation through various graphical views of traced information reveals differences in performance for OpenMP versions of these codes.

backnext page

HiPERiSM Consulting, LLC, (919) 484-9803 (Voice)

(919) 806-2813 (Facsimile)