1.0 The Stommel Ocean Model
1.1 Serial performance
This
is a comparison of serial performance with different compilers on Intel Linux
platforms for the same floating-point intensive application. The application is the
Stommel Ocean Model (SOM) and the Fortran 90 source code was developed by Dr. Timothy
Kaiser. It is available at http://www.npaci.edu/Training/AHM00
and the same site offers an OpenMP version that is discussed in Section 2. The compute
kernel of the SOM is a double-nested loop that performs a Jacobi iteration sweep over a
two-dimensional finite difference grid and the number of iterations is set to 100. For
this study three problem sizes were chosen corresponding to three choices of the number,
N, of interior grid points, as shown in Table 1.1.
Table 1.1:
Stommel Ocean Model problem size and memory requirement |
Grid
size N |
N x N |
Memory
required |
1000 |
1
x 106 |
66
MB |
2000 |
4
x 106 |
194
MB |
4000 |
16
x 106 |
728
MB |
Table
1.2 shows results of CPU time for the SOM code compiled using two different Fortran 90
compilers with the Linux operating system on a dual processor Intel Pentium
III 933MHz workstation with a 256KB on-processor L2 cache (this is the Flip Chip PGA
version of the Pentium III). Appropriate optimization levels were chosen for Pentium
processors, but SSE instruction optimizations were not utilized. This section presents
timing results for the serial (single CPU) version of the code. The times reported where
obtained with calls to the f90 system_clock routine. The differences in timing between
those computed from the system_clock calls and the Linux time command are typically
less than two percent and will be ignored.
Table 1.2:
CPU time in seconds for the SOM on a 933MHz dual processor Intel Pentium III
workstation with Red Hat Linux 6.2 for two different Fortran 90 compilers (see notes
to table for details). |
Grid
size N |
Portland Group, Inc.,
Fortran 90, v3.1 [1] |
Absoft Fortran 90, v6.2
[2] |
1000 |
17.7 |
15.1 |
2000 |
73.5 |
60.4 |
4000 |
283.7 |
250.4 |
[1]
PGF90 fortran compiler (Linux distribution) from the Portland Group Inc., (http://www.pgroup.com). |
[2]
Absoft Pro Fortran (Linux distribution) from the Absoft Corporation (http://www.absoft.com). |
2.1 OpenMP parallel performance
This
section shows OpenMP parallel performance for the Stommel Ocean Model (SOM) with the
Portland Group Inc., Fortran 90 compiler for Linux. In the OpenMP implementation the
double-nested loop that performs a Jacobi iteration sweep is parallelized (with OpenMP
directives) on the outer loop as shown in Table 2.1.
Table 2.1: OpenMP implementation (by Dr.
Kaiser) of the Jacobi iteration in the SOM (with modifications for the L2 error norm) |
!$OMP PARALLEL DO SCHEDULE (STATIC) private(i) firstprivate(a1,a2,a3,a4,a5) reduction(+:diff)
do j=j1,j2
do i=i1,i2
new_psi(i,j)=a1*psi(i+1,j) + a2*psi(i-1,j) + &
a3*psi(i,j+1) + a4*psi(i,j-1) - &
a5*for(i,j)
! Choose L2 norm
diff=diff + (new_psi(i,j)-psi(i,j)) ** 2
enddo
enddo
!
do j=j1,j2
do i=i1,i2
psi(i,j)=new_psi(i,j)
enddo
enddo
!$OMP END PARALLEL DO
!
! for L2 norm only
!
diff = sqrt (diff)/ float ( (j2-j1+1)*(i2-i1+1) )
END
|
Table
2.2 shows timing results for the single (p=1) and dual (p=2) thread versions of the OpenMP
code. The time shown here is elapsed time as reported by the Linux time command. Table 2.3
shows the corresponding parallel speed-up and parallel efficiency.
Table 2.2:
Elapsed time in seconds with OpenMP for the SOM on a 933MHz dual processor Intel
Pentium III workstation using Red Hat Linux 6.2 and the PGF90 fortran
compiler from the Portland Group Inc. |
Grid
size N |
p =1 |
p = 2 |
1000 |
19.1 |
12.9 |
2000 |
71.4 |
49.4 |
4000 |
292.6 |
205.4 |
Table 2.3:
Parallel speed up and efficiency with OpenMP for the SOM on a 933MHz dual processor
Intel Pentium III workstation using Red Hat Linux 6.2 and the PGF90
fortran compiler from the Portland Group Inc. |
Grid
size N |
Speed up |
Efficiency |
1000 |
1.48 |
0.74 |
2000 |
1.45 |
0.73 |
4000 |
1.42 |
0.71 |
HiPERiSM Consulting, LLC, (919) 484-9803
(Voice)
(919) 806-2813 (Facsimile) |