1.0 Combinatorial problems (part 2)
1.1 The permanent of (0,1) matrices
This is a continuation of the previous report's tests with
the Kallman algorithm to compute the permanent of a (0,1) matrix. A discussion of other
alternatives, and why the Kallman algorithm is preferred, is discussed by Delic and Cash
in Computer Physics Communications, vol. 124, pp. 315-329, and the source code in question
is available from CPC.
Kallman's algorithm relies on packing the rows of a (0,1) matrix in the respective bits of
a word and then performs group operations on theses words, e.g. with masking operations,
or population count. While the Intel based architectures discussed here are limited
to 32 bit words, by a simple device, it is possible to spread row sizes longer than 32
columns across mutliple words. The version of the Kallman algorithm (KALLPC) studied here
employs this method. The 64 bit version (KALLC90) is a different source code (not
discussed in this report) and for 64 bit word platforms can deliver performance superior
to the IA-32 results shown here.
Table 1.1 shows results of CPU time for the KALLPC code compiled using two different
Fortran 90 compilers with the Linux operating system. KALLPC was executed in single
thread mode on a dual processor Intel Pentium III 933MHz workstation with a 256MB
on-processor L2 cache (this is the Flip Chip PGA version of the Pentium III). The KALLPC
application is CPU intensive, performs a small amount of I/O only at the beginning and end
of each run, and has modest memory requirements (~ 2 Mbytes). The Kallman algorithm runs
in scalar mode because of a complex branching structure that inhibits parallelization.
Both KALLPC (and KALLC90) perform no floating point work and therefore they do not test
numerical performance. Both codes are integer, logical, and bit-wise operation intensive
and represent a very special type of metric for compiler and implementation performance.
Therefore the usual caution should be exercised in making generalizations. Numerical
performance of these compilers will be the subject of separate reports.
Table 1.1:
CPU time in seconds for KALLPC on a 933MHz dual processor Intel Pentium III
workstation with Red Hat Linux 6.2 for two different Fortran 90 compilers (see notes
to table for details). |
Matrix size N |
Portland Group, Inc.,
Fortran 90, v3.1 [1] |
Absoft Fortran 90, v6.2
[2,3] |
30 |
0.60 |
0.25 |
44 |
136.9 |
46.4 |
48 |
22.9 |
7.4 |
52 |
84.0 |
26.5 |
56 |
721.1 |
229.2 |
60 |
46079.4 |
15029.3 |
[1]
PGF90 fortran compiler (Linux distribution) from the Portland Group Inc., (http://www.pgroup.com). |
[2]
Absoft Pro Fortran (Linux distribution) from the Absoft Corporation (http://www.absoft.com). |
[3]
Compare this result for N=60 with 66148.6 sec (f90) on the Cray C90 (240MHz) in the
64 bit version, KALLC90, using the HPF POPCNT intrinsic function, or 22032.1 sec (f77) on
the SGI Onyx2 with R10,000 processors (195MHz, 4MB L2 cache). |
HiPERiSM Consulting, LLC, (919) 484-9803
(Voice)
(919) 806-2813 (Facsimile) |