HiPERiSM - High Performance Algorism Consulting
HCTR-2001-4: Compiler Performance 5
1.0 The Stommel Ocean Model: 2-D MPI decomposition
1.1 Serial and MPI code
This is a comparison of parallel MPI performance with SUN Fortran compilers on the SUN E10000 platform for a floating-point intensive application. The application is the Stommel Ocean Model (SOM77) and the Fortran 77 source code was developed by Jay Jayakumar (serial version) and Luke Lonnergan (MPI version) at the NAVO site, Stennis Space Center, MS. It is available at http://www.navo.hpc.mil/pet/Video/Courses/MPI_Finite. The algorithm is identical to the Fortran 90 version discussed in report HCTR-2001-2 but the Fortran 77 version allows for more flexibility in the domain decomposition for MPI. The serial version (STML77S0) and MPI parallel version (STMLPI2D) of SOM77 have the calling tree (produced by Flint from Cleanscape Software) as shown in Table 1.1.
This is a 2-dimensional domain decomposition in both the x and y direction with horizontal and vertical square slabs of the domain passed to different MPI processes (one square sub-domain per process). In the MPI version all parameters must be broadcast by process with rank 0 to all processes before computation begins. Otherwise the code is identical to the serial version excepting that each MPI process operates on its own square of the domain. At the beginning of each iteration the processes synchronize boundary values by exchanging adjacent ghost rows (parallel to either the x or y direction) with the nearest neighbor processes whenever square sides are adjacent (subroutine EXCHANGE). The exterior boundaries of the outermost squares do not exchange rows since they correspond to domain boundaries.
The compute kernel of the SOM77 is a double-nested loop that performs a Jacobi iteration sweep over a two-dimensional finite difference grid and the number of iterations is set to 100. More details of the model are discussed in HiPERiSM courses HC6 and HC8 (see the services page). For this study the problem size sets the number of interior grid point at N=1000 and 2000 for Cartesian grids of 1000 x 1000 and 2000 x 2000, respectively.
2.1 MPI parallel performance
This section shows MPI parallel performance for the Stommel Ocean Model (SOM77) in a 2-D MPI domain decomposition with the SUN Fortran 77 compiler. Figures 2.1 and 2.2 show the time in seconds (as reported by the MPI W_TIME procedure) and Figures 2.3 and 2.4 show the corresponding speed up. Tables 2.1 and 2.2 summarize efficiency values for the respective problem sizes of N=1000 and N=2000 and show a clear cache effect in the superlinear speed up.
Fig. 2.1. Time to solution in seconds for SOM 2-D when N=1000 on the SUN E10000 for 1, 2´2 and 4´4 MPI processes.
Fig. 2.2. Time to solution in seconds for SOM 2-D when N=2000 on the SUN E10000 for 1, 2´2 and 4´4 MPI processes.
Fig. 2.3. Speed up for SOM 2-D when N=1000 on the SUN E10000 for 4´4, 2´2, and 1 MPI processes.
Fig. 2.4. Speed up for SOM 2-D when N=2000 on the SUN E10000 for 4´4, 2´2, and 1 MPI processes.
HiPERiSM Consulting, LLC, (919) 484-9803 (Voice)
(919) 806-2813 (Facsimile)