HiPERiSM's Technical Reports
HiPERiSM - High Performance Algorism Consulting
HCTR-2010-2: CMAQ Performance 2
CMAQ SERIAL AND MPI RESULTS
HiPERiSM Consulting, LLC.
4.0 SERIAL AND MPI RESULTS
The section numbering continues from the previous report.
4.1 Intel™ compiler on three platforms
Runtime results for CMAQ 4.6.1 with the three solver versions is shown in Table 4.1 for three generations of Intel platforms with the highest optimization level (ifc4) for the Intel™ compiler.
Table 4.1. Wall clock times in hours for solvers in the serial version of CMAQ 4.6.1 for the Intel™ compiler with the fastest optimization (compiler group ifc4).
All three solver versions of CMAQ have gained from the evolution of commodity computer architectures with an average speed-up versus the Itanium2™ of 2.4 (QC-1) and 3.2 (QC-2). However, the speed-up of CMAQ on QC-2 versus QC-1 is in the range 1.3 – 1.5 which is only half of the potential speed-up possible between two generations of quad-core processor technology.
4.2 Intel™ versus Portland™ compilers
Typical runtime results for the standard U.S. EPA distribution of CMAQ 4.6.1 are shown in Tables 4.2 and 4.3, for Intel™ and Portland™ compilers, respectively. In both cases the “*” indicates dedicated runs and all others are for concurrent execution. Table 4.4 show the ratios of times in corresponding cells of the preceding two tables.
Table 4.2. Wall clock times in hours for three solvers in the serial version of CMAQ 4.6.1 on the QC-1 platform for the Intel™ compiler switch groups ifc1 to ifc4.
The speed-up over the Itanium2™ platform with the Intel™ compiler is in the range 2.1 to 2.9 on the QC-1 platform, depending on the solver and compiler group used.
Table 4.3. Wall clock times in hours for three solvers in the serial version of CMAQ 4.6.1 on the QC-1 platform for the Portland compiler switch groups pgf1 to pgf4.
In the QC-1 case the difference in times for the ifc4 and pgf4 cases is due in part to the fact that the pgf4 runs were concurrent (overlapping) and this may expand wall clock time by the order of 10%.
Table 4.4. Ratios of wall clock times for three solvers in the serial version of CMAQ 4.6.1 on the QC-1 platform. The ratios are for Intel™ (ifc) versus Portland (pgf) compilers for each compiler switch group.
Note that, from Table 4.2, the increase in runtime for use of the Intel™ compiler group ifc3 versus ifc4 is in the range 10% to 27%, whereas for the Portland compiler the corresponding increase is in the range 2% to 8% (Table 4.3). As a result the comparative times for use of groups ifc3 and pgf3 in the respective compilers shrinks to the order of 5%. The use of the ifc3 and pgf3 compiler groups is recommended for reasons of improved precision in concentration values for some species.
4.3 MPI results
The preceding tables showed results for the standard U.S. EPA distribution with no parallel execution enabled. This section presents MPI results for EBI and Rosenbrock (ROS3) chemistry solver versions of CMAQ 4.6.1. Table 4.5 summarizes the CMAQ 4.6.1 runtimes (in hours) with the Portland compiler in an MPI implementation. Also shown there is the scaling with increasing MPI process count and it is notable that speedup departs significantly from linearity with more than 4 MPI processes.
Table 4.5. Wall clock times (in hours), parallel scaling, and parallel efficiency for two solvers in the MPI implementation of EPA’s standard release of CMAQ 4.6.1 on the HiPERiSM QC Cluster platform for the Portland compiler group pgf3.
Corresponding to the previous table, Fig 4.1 summarizes the CMAQ 4.6.1 MPI parallel efficiency with increasing process count. It is clear that EBI and ROS3 solvers show a steep decline in MPI parallel efficiency when NP>4. The asymptote of parallel efficiency is of the order of 50% for 16 MPI processes where CPUs are idle for half of the wall clock time (on average).
Fig. 4.1. MPI Parallel efficiency for CMAQ 4.6.1 EBI and ROS3 solvers.
Part of this work was performed by HiPERiSM Consulting, LLC, as subcontractor to Computer Sciences Corporation, under U.S. EPA SES3 Contract GS-35F-4381G BPA 0775, Task Order 1522
Follow the "Next" button to view the next report in this series.
HiPERiSM Consulting, LLC, (919) 484-9803 (Voice)
(919) 806-2813 (Facsimile)