Hiperism Consulting, LLC: HCTR-2001-2

HiPERiSM's Technical Reports

HiPERiSM - High Performance Algorism Consulting

HCTR-2001-2: Compiler Performance 3

1.0 The Stommel Ocean Model

1.1 Serial performance

This is a comparison of serial performance with different compilers on Intel Linux™ platforms for the same floating-point intensive application. The application is the Stommel Ocean Model (SOM) and the Fortran 90 source code was developed by Dr. Timothy Kaiser. It is available at http://www.npaci.edu/Training/AHM00 and the same site offers an OpenMP version that is discussed in Section 2. The compute kernel of the SOM is a double-nested loop that performs a Jacobi iteration sweep over a two-dimensional finite difference grid and the number of iterations is set to 100. For this study three problem sizes were chosen corresponding to three choices of the number, N, of interior grid points, as shown in Table 1.1.

Table 1.1: Stommel Ocean Model problem size and memory requirement
Grid size N	N x N	Memory required
1000	1 x 10⁶	66 MB
2000	4 x 10⁶	194 MB
4000	16 x 10⁶	728 MB

Table 1.2 shows results of CPU time for the SOM code compiled using two different Fortran 90 compilers with the Linux™ operating system on a dual processor Intel™ Pentium III 933MHz workstation with a 256KB on-processor L2 cache (this is the Flip Chip PGA version of the Pentium III). Appropriate optimization levels were chosen for Pentium processors, but SSE instruction optimizations were not utilized. This section presents timing results for the serial (single CPU) version of the code. The times reported where obtained with calls to the f90 system_clock routine. The differences in timing between those computed from the system_clock calls and the Linux™ time command are typically less than two percent and will be ignored.

Table 1.2: CPU time in seconds for the SOM on a 933MHz dual processor Intel™ Pentium III workstation with Red Hat Linux™ 6.2 for two different Fortran 90 compilers (see notes to table for details).
Grid size N	Portland Group, Inc., Fortran 90, v3.1 [1]	Absoft Fortran 90, v6.2 [2]
1000	17.7	15.1
2000	73.5	60.4
4000	283.7	250.4
[1] PGF90™ fortran compiler (Linux™ distribution) from the Portland Group Inc., (http://www.pgroup.com).
[2] Absoft Pro Fortran™ (Linux™ distribution) from the Absoft Corporation (http://www.absoft.com).

2.1 OpenMP parallel performance

This section shows OpenMP parallel performance for the Stommel Ocean Model (SOM) with the Portland Group Inc., Fortran 90 compiler for Linux™. In the OpenMP implementation the double-nested loop that performs a Jacobi iteration sweep is parallelized (with OpenMP directives) on the outer loop as shown in Table 2.1.

Table 2.1: OpenMP implementation (by Dr. Kaiser) of the Jacobi iteration in the SOM (with modifications for the L2 error norm)

!$OMP PARALLEL DO SCHEDULE (STATIC) private(i) firstprivate(a1,a2,a3,a4,a5) reduction(+:diff)
    do j=j1,j2
        do i=i1,i2
            new_psi(i,j)=a1*psi(i+1,j) + a2*psi(i-1,j) + &
                         a3*psi(i,j+1) + a4*psi(i,j-1) - &
                         a5*for(i,j)
! Choose L2 norm
           diff=diff + (new_psi(i,j)-psi(i,j)) ** 2
         enddo
     enddo
!
    do j=j1,j2
        do i=i1,i2
            psi(i,j)=new_psi(i,j)
         enddo
     enddo
!$OMP END PARALLEL DO
!
! for L2 norm only
!
     diff = sqrt (diff)/ float ( (j2-j1+1)*(i2-i1+1) )
    END

Table 2.2 shows timing results for the single (p=1) and dual (p=2) thread versions of the OpenMP code. The time shown here is elapsed time as reported by the Linux time command. Table 2.3 shows the corresponding parallel speed-up and parallel efficiency.

Table 2.2: Elapsed time in seconds with OpenMP for the SOM on a 933MHz dual processor Intel™ Pentium III workstation using Red Hat Linux™ 6.2 and the PGF90™ fortran compiler from the Portland Group Inc.
Grid size N	p =1	p = 2
1000	19.1	12.9
2000	71.4	49.4
4000	292.6	205.4

Table 2.3: Parallel speed up and efficiency with OpenMP for the SOM on a 933MHz dual processor Intel™ Pentium III workstation using Red Hat Linux™ 6.2 and the PGF90™ fortran compiler from the Portland Group Inc.
Grid size N	Speed up	Efficiency
1000	1.48	0.74
2000	1.45	0.73
4000	1.42	0.71

HiPERiSM Consulting, LLC, (919) 484-9803 (Voice)

(919) 806-2813 (Facsimile)