HiPERiSM's Technical Reports

HiPERiSM - High Performance Algorism Consulting

HCTR-2001-2: Compiler Performance 3

 



 
 
 

1.0 The Stommel Ocean Model

1.1 Serial performance

This is a comparison of serial performance with different compilers on Intel Linux™ platforms for the same floating-point intensive application. The application is the Stommel Ocean Model (SOM) and the Fortran 90 source code was developed by Dr. Timothy Kaiser. It is available at http://www.npaci.edu/Training/AHM00 and the same site offers an OpenMP version that is discussed in Section 2. The compute kernel of the SOM is a double-nested loop that performs a Jacobi iteration sweep over a two-dimensional finite difference grid and the number of iterations is set to 100. For this study three problem sizes were chosen corresponding to three choices of the number, N, of interior grid points, as shown in Table 1.1.

Table 1.1: Stommel Ocean Model problem size and memory requirement

Grid size N

N x N

Memory required

1000

1 x 106

66 MB

2000

4 x 106

194 MB 

4000

16 x 106

728 MB


Table 1.2 shows results of CPU time for the SOM code compiled using two different Fortran 90 compilers with the Linux™ operating system on a dual processor Intel™ Pentium III 933MHz workstation with a 256KB on-processor L2 cache (this is the Flip Chip PGA version of the Pentium III).  Appropriate optimization levels were chosen for Pentium processors, but SSE instruction optimizations were not utilized. This section presents timing results for the serial (single CPU) version of the code. The times reported where obtained with calls to the f90 system_clock routine. The differences in timing between those computed from the system_clock calls and the Linux™ time command are typically less than two percent and will be ignored.

Table 1.2: CPU time in seconds for the SOM on a 933MHz dual processor Intel™ Pentium III workstation with Red Hat Linux™ 6.2 for two different Fortran 90 compilers (see notes to table for details).

Grid size N

Portland Group, Inc., Fortran 90, v3.1 [1] Absoft Fortran 90, v6.2 [2]

1000

17.7

15.1

2000

73.5

60.4

4000

283.7

250.4

[1] PGF90™ fortran compiler (Linux™ distribution) from the Portland Group Inc., (http://www.pgroup.com).
[2] Absoft Pro Fortran™ (Linux™ distribution) from the Absoft Corporation (http://www.absoft.com).

 

2.1 OpenMP parallel performance

This section shows OpenMP parallel performance for the Stommel Ocean Model (SOM) with the Portland Group Inc., Fortran 90 compiler for Linux™. In the OpenMP implementation the double-nested loop that performs a Jacobi iteration sweep is parallelized (with OpenMP directives) on the outer loop as shown in Table 2.1. 

Table 2.1: OpenMP implementation (by Dr. Kaiser) of the Jacobi iteration in the SOM (with modifications for the L2 error norm)
!$OMP PARALLEL DO SCHEDULE (STATIC) private(i) firstprivate(a1,a2,a3,a4,a5) reduction(+:diff)
    do j=j1,j2
        do i=i1,i2
            new_psi(i,j)=a1*psi(i+1,j) + a2*psi(i-1,j) + &
                         a3*psi(i,j+1) + a4*psi(i,j-1) - &
                         a5*for(i,j)
! Choose L2 norm
           diff=diff + (new_psi(i,j)-psi(i,j)) ** 2
         enddo
     enddo
!
    do j=j1,j2
        do i=i1,i2
            psi(i,j)=new_psi(i,j)
         enddo
     enddo
!$OMP END PARALLEL DO
!
! for L2 norm only
!
     diff = sqrt (diff)/ float ( (j2-j1+1)*(i2-i1+1) )
    END     		

Table 2.2 shows timing results for the single (p=1) and dual (p=2) thread versions of the OpenMP code. The time shown here is elapsed time as reported by the Linux time command. Table 2.3 shows the corresponding parallel speed-up and parallel efficiency.

Table 2.2: Elapsed time in seconds with OpenMP for the SOM on a 933MHz dual processor Intel™ Pentium III workstation using Red Hat Linux™ 6.2 and the PGF90™ fortran compiler  from the Portland Group Inc.

Grid size N

p =1

p = 2

1000

19.1

12.9

2000

71.4

49.4

4000

292.6

205.4

 

Table 2.3: Parallel speed up and efficiency with OpenMP for the SOM on a 933MHz dual processor Intel™ Pentium III workstation using Red Hat Linux™ 6.2 and the PGF90™ fortran compiler from the Portland Group Inc.

Grid size N

Speed up

Efficiency

1000

1.48

0.74

2000

1.45

0.73

4000

1.42

0.71

backnext page

HiPERiSM Consulting, LLC, (919) 484-9803 (Voice)

(919) 806-2813 (Facsimile)