HiPERiSM's Technical Reports

HiPERiSM - High Performance Algorism Consulting

HCTR-2006-1: Compiler Performance 15

 

PERFORMANCE ANALYSIS OF AERMOD ON COMMODITY PLATFORMS #2

George Delic

HiPERiSM Consulting, LLC.

 

1.  INTRODUCTION

This is an update to the previous progress report to evaluate industry standard fortran 90/95 compilers for IA-32 Linux™ commodity platforms when applied to the Air Quality Models (AQM) AERMOD. New results are presented for AERMOD that give insight into the algorithm’s performance on commodity architectures with the current (CY2006) releases of three compilers. This report will show only execution times and the compiler switches used to produce them.

2.0 CHOICE OF HARDWARE AND OPERATING SYSTEM

The hardware used for the results reported here is the Intel Pentium 4 Xeon (P4) and Pentium Xeon 64EMT (P4e) processors. These have processor clock rates of 3GHz and 3.4GHz, respectively. Each is in a dual configuration with a corresponding front side bus (FSB) of 533MHz and 800HMz shared by each pair of processors. The operating system (OS) is HiPERiSM Consulting, LLC’s modification of the Linux™ 2.6.9 kernel to include a patch that enables access to hardware performance counters. The times reported were taken from the process time reported by hardware counters. In the following discussion x86_32 denotes a 32 bit hardware and 32 bit operating system, and x86_64 denotes 64 bit hardware and a 64 bit operating system. In addition to these Intel processors and OS, some results were obtained for MicroSoft Windows 2000 and AMD Athlon and Opteron processors with Linux kernels.

3.0 CHOICE OF COMPILERS

The compilers used were the Portland pgf90/95 (release 6.0 and 6.1), Intel ifort (release 9.0 and 9.1), and Absoft f90/f95 (release 9.0 and 10.0). The choice of optimization switches for each compiler is shown in the tabulation of the results. Numerous combinations of compiler switches were tested for each compiler and the ones shown here are those that delivered the smallest execution times in each case.

4.0 CHOICE OF BENCHMARKS

The AERMOD code describes pollutant dispersion and deposition and is now an approved regulatory model for new source reviews and other permitting applications. It is predominantly a Fortran 77 code developed over ten years ago but has since used (in small part) Fortran 90 features. As such, and typical of that generation of environmental models, AERMOD was developed on a PC platform, with a small memory requirement, poor vector character, and I/O bound performance characteristics. AERMOD and other AQM’s are available at the U.S. EPA’s Support Center for Regulatory Air Models (EPA-SCRAM). The version used here is AERMOD 04300 and was provided to HiPERiSM by the U.S. EPA in December, 2005 for analysis of compiler performance. Two separate data sets are used in two sets of benchmarks with AERMOD 04300.

5.0 BENCHMARK DATA SETS USED

5.1 EPA benchmark set EPA-E2

The U.S. EPA provided a test data set and this is designated as the EPA-E2 benchmark. Results for this benchmark were collected in December (2005) and January (2006) for the compiler releases generally available at that time. An overview of results is shown in Figure 1, for the runtimes shown in Table 1.

Fig. 1. Run time from Table 1 for AERMOD benchmark EPA-E2 with three compilers on five platforms. The 32 bit platforms are shown on the left and 64 bit platforms on the right. The “baseline benchmark” represents the choice of compiler and switches in use at the U.S. EPA before this analysis. The switch mnemonics are defined in Table 1 where the corresponding compiler switches are listed. The best time reported in this group is for the Absoft v9.0 compiler.

Table 1. EPA-E2 run times for AERMOD with three compilers on five platforms

Platform

Operating System

Compiler

Version

Time (sec)

Ratio to fastest

Switch mnemonic

Compiler switches

x86_32 Intel

Windows

Absoft

9.0

1411.2

1.77

opt

-cpu:p7 –O3 -YEXT_NAMES=LCS -YEXT_SFX=_ -s -YCFRL=1 -ffixed

x86_32 Intel

Linux

Portland

6.0

1595.2

2.00

fast

-fast -tp p7 -Minfo -Mlfs -Bstatic

x86_32 Intel

Linux

Absoft

9.0

1016.0

1.27

opt

-cpu:p7 -O2 -YEXT_NAMES=LCS -YEXT_SFX=_ -s -YCFRL=1 -ffixed

x86_32 Intel

Linux

Portland

6.0

1366.0

1.71

best

-fastsse -Mscalarsse -Mcache_align -Minline=size:100 -tp p7 -Minfo -Mlfs -Bstatic

x86_32 Intel

Linux

Intel

9.0

1675.2

2.10

ipo

-tpp7 -xW -O3 -Ob2 -ipo -static -FI

x86_32 AMD

Linux

Absoft

9.0

1226.7

1.53

sse

-O3 -YEXT_NAMES=LCS -YEXT_SFX=_ -s -YCFRL=1 -ffixed

x86_32 AMD

Linux

Portland

5.1

2048.2

2.56

best

-Mscalarsse -Mcache_align -Minline=size:100 -tp athlonxp -Minfo -Mlfs -Bstatic

 

 

 

 

 

 

 

 

x86_64 Intel

Linux

Absoft

9.0

799.2

1.00

sse

-O3 -YEXT_NAMES=LCS -YEXT_SFX=_ -s -YCFRL=1 -ffixed

x86_64 Intel

Linux

Intel

9.0

908.1

1.14

ipo

-tpp7 -xW -O3 -Ob2 -ipo -static -FI

x86_64 Intel

Linux

Portland

6.0

1108.5

1.39

best

-fastsse -Mscalarsse -Mcache_align -Minline=size:100 -tp p7-64 -Minfo -Bstatic

x86_64 AMD

Linux

Absoft

9.0

968.2

1.21

sse

-O3 -YEXT_NAMES=LCS -YEXT_SFX=_ -s -YCFRL=1 -ffixed

With release v6.1 of the Portland compiler and v9.1 of Intel's compiler the EPA-E2 benchmark was repeated. Table 2  summarizes these results and for redundancy the best results from Table 1 (for Absoft fortran) are repeated. Note the significant improvement from Intel fortran v9.0 to v9.1 on both x86_32 and x86_64 platforms, and the improvement in the Portland compiler result for x86_64.

 Table 2. EPA-E2 run times for AERMOD with Absoft and Intel compilers on x86_32 and x86_64 platforms

Platform

Operating System

Compiler

Version

Time (sec)

Switch mnemonic

Compiler switches

x86_32 Intel

Linux

Absoft

9.0

1016.0

opt

-cpu:p7 -O2 -YEXT_NAMES=LCS -YEXT_SFX=_ -s -YCFRL=1 -ffixed

x86_32 Intel

Linux

Intel

9.0

1701.0

ipo

-tpp7 -xW -O3 -Ob2 -ipo -static -FI

x86_32 Intel

Linux

Intel

9.1

696.4

ipo

-tpp7 -xW -O3 -Ob2 -ipo -static -FI

 

 

 

 

 

 

 

x86_64 Intel

Linux

Absoft

9.0

799.2

sse

-O3 -YEXT_NAMES=LCS -YEXT_SFX=_ -s -YCFRL=1 -ffixed

x86_64 Intel

Linux

Absoft

10.0

872.0

best

-Ofast -speed_math=9 -IPA:plimit=15000 -TARG:sse3=on -LNO:fu=9:full_unroll_size=7000 -WOPT:if_conv=off -march=em64t -xINTEGER -YEXT_NAMES=LCS -YEXT_SFX=_ -s -YCFRL=1 -ffixed

x86_64 Intel

Linux

Intel

9.0

909.9

ipo

-tpp7 -xW -O3 -Ob2 -ipo -static -FI

x86_64 Intel

Linux

Intel

9.1

704.5

ipo

-tpp7 -xW -O3 -Ob2 -ipo -static -FI

x86_64 Intel

Linux

Portland

6.0

1108.5

best

-fastsse -Mscalarsse -Mcache_align -Minline=size:100 -tp p7-64 -Minfo -Bstatic

x86_64 Intel

Linux

Portland

6.1

820.0

best

-fastsse -Mscalarsse -Mcache_align -Minline=size:100 -tp p7-64 -Minfo -Bstatic

 

Fig. 2. Run time from Table 2 for AERMOD benchmark EPA-E2 with three compilers on the 64EMT platform. This figure compares performance of the compilers with results of the latest releases and previous release, in each case. The switch mnemonics are defined in Table 2 where the corresponding compiler switches are listed. The best time reported in this group is for the Intel v9.1 compiler.

5.2 ENV benchmark set ENV-T2

An environmental consulting firm provided a test data set with a request to evaluate compilers and this benchmark is designated as the ENV-T1. Tables 3 to 5 show results for the current (Q3CY2006) releases of the Absoft, Intel, and Portland compilers on x86_32 and x86_64 platforms. Note that for this benchmark the Intel compiler reports the shortest run time on both platforms and it is interesting that for the 64EMT the improvement is only 6.9% over the x86_32 case.

 Table 3. ENV-T1 run times for AERMOD with Absoft compilers on x86_32 and x86_64 platforms 

Platform

Operating System

Compiler

Version

Time (sec)

Switch mnemonic

Compiler switches

x86_32 Intel

Linux

Absoft

9.0

2142.2

opt

-cpu:p7 -O2 -YEXT_NAMES=LCS -YEXT_SFX=_ -s -YCFRL=1 -ffixed

x86_32 Intel

Linux

Absoft

10.0

3612.9

best

-Ofast -speed_math=9 -IPA:plimit=15000 -TARG:sse2=on -LNO:fu=9:full_unroll_size=7000 -WOPT:if_conv=off -TARG:processor=pentium4 -xINTEGER -YEXT_NAMES=LCS -YEXT_SFX=_ -s -YCFRL=1 -ffixed

x86_64 Intel

Linux

Absoft

9.0

1749.0

sse

-O3 -YEXT_NAMES=LCS -YEXT_SFX=_ -s -YCFRL=1 -ffixed

x86_64 Intel

Linux

Absoft

10.0

1924.1

best

-Ofast -speed_math=9 -IPA:plimit=15000 -TARG:sse3=on -LNO:fu=9:full_unroll_size=7000 -WOPT:if_conv=off -march=em64t -xINTEGER -YEXT_NAMES=LCS -YEXT_SFX=_ -s -YCFRL=1 -ffixed

 

 Table 4. ENV-T1 run times for AERMOD with Intel compilers on x86_32 and x86_64 platforms

Platform

Operating System

Compiler

Version

Time (sec)

Switch mnemonic

Compiler switches

x86_32 Intel

Linux

Intel

9.1

1235.7

ipo

-tpp7 -xW -O3 -Ob2 -ipo -static -FI

x86_64 Intel

Linux

Intel

9.1

1155.8

ipo

-tpp7 -xW -O3 -Ob2 -ipo -static -FI

 

 Table 5. ENV-T1 run times for AERMOD with Portland compilers on x86_32 and x86_64 platforms. 

Platform

Operating System

Compiler

Version

Time (sec)

Switch mnemonic

Compiler switches

x86_32 Intel

Linux

Portland

6.0

2656.4

best

 

-fastsse -Mscalarsse -Mcache_align -Minline=size:100 -Minfo -Mlfs -Bstatic

             

x86_64 Intel

Linux

Portland

6.1

1685.8

 

best

-fastsse -Mscalarsse -Mcache_align -Minline=size:100 -tp p7-64 –Minfo -Bstatic

 

Fig. 3. Run time from Tables 3-5 for AERMOD benchmark ENV-T1 with three compilers on two platforms. The 32 bit platforms are shown on the left and 64 bit platforms on the right. The switch mnemonics are defined in Tables 2 and 3 where the corresponding compiler switches are listed. The best time reported in this group is for the Intel v9.1 compiler.

6.0 SUMMARY OF AERMOD PERFORMANCE RESULTS

This performance analysis of AERMOD, shows several important features of current compiler technology for commodity hardware:

  • performance results with three compilers show a great deal of variability: 29% and 67% in the EPA-E2 and ENV-T1 benchmarks, respectively for the Pentium 4 Xeon 64 EMT. 

  • very significant changes in performance occur between even minor releases of a specific compiler: typically 22%-26% (with the exception of Absoft v10.0 versus v9.0).

  • compilers have undergone a rapid maturation process in less than two years and continue to exchange leadership in performance as new releases arrive.

  • Performance on 64-bit platforms is superior to that on 32-bit platforms: ranging from 7% (Intel) to 87% (Absoft) improvement in runtime.

7.0   CONCLUSIONS

This performance analysis of AERMOD, shows that compilers must be tested with each new release because performance differences are very significant (even between minor releases). Furthermore, it can no longer be assumed that one single compiler will provide superior performance indefinitely. In fact, even within the space of a year, dominance in performance can change between different compilers. However, one note of warning needs to be sounded: even though a new release of a compiler can give dramatically improved performance in one application is may not do so in another (see the following report on CAMx where the Intel ifort 9.1 compiler aborts with an internal compiler error, whereas the v9.0 release completes compilation and execution successfully).

One important result of this analysis is that by simply changing hardware platforms and compilers it is possible to see performance enhancement by as much as a factor of two or more. However, only a parallel version of AERMOD can promise to deliver more performance.

 

backnext page

HiPERiSM Consulting, LLC, (919) 484-9803 (Voice)

(919) 806-2813 (Facsimile)