HiPERiSM's Technical Reports

HiPERiSM - High Performance Algorism Consulting

HCTR-1999-1: Compiler Performance 1



1.0 Combinatorial problems

1.1 The permanent of (0,1) matrices

Here attention is confined to the Kallman algorithm to compute the permanent of a (0,1) matrix. A discussion of other alternatives, and why the Kallman algorithm is preferred, is discussed by Delic and Cash (submitted to Computer Physics Communications). The source code in question will be available for release after this manuscript has been accepted for publication.

The permanent of a matrix, is a close relative to the determinant, but does not share the latter's nice properties. There is a great deal of interest in the permanent in combinatorial mathematics and in chemical graph theory where it is used to classify chemical structures.

The problems with computing the permanent are that (a) the value can exceed the largest integer number representable with the available machine word length, and (b) the time to solution grows faster than exponential with matrix size. Both these features are a challenge in designing an efficient and robust algorithm.

Kallman's algorithm relies on packing the rows of a (0,1) matrix in the respective bits of a word and then performs group operations on theses words, e.g. with masking operations, or population count. While the Intel™ based architectures discussed here are limited to 32 bit words, by a simple device, it is possible to spread row sizes longer than 32 columns across mutliple words. The version of the Kallman algorithm (KALLPC) studied here employs this method.

The largest proportion of the total CPU time is spent in the population count function to find the number of non-zero bits in a word. There is a High Performance Fortran (HPF) intrinsic integer function POPCNT to perform this operation, but while it is offered by some mainframe vendors as an extension to the ASI/ISO standard under FORTRAN77 or Fortran 90, POPCNT is not available as an intrinsic integer function in any of the personal computer compilers discussed in this report. Consequently, some method of counting the number of non-zero bits has to be devised. There are several ways of doing this and the MIL-STD-1753 bit intrinsic functions are available for this purpose. These were introduced as an extension to the FORTRAN 77 standard and most are included with the Fortran 90 standard. The POPCNT function used in the KALLPC code discussed here is shown in Table 1.1.

Table 1.1: POPCNT function to return number of non-zero bits in an integer word 
    IF (MX .EQ. 0) RETURN
    DO 1 I=1,30
    DATA ITWOS/1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,

It is worthwhile remarking that the IAND, IOR, and NOT intrinsic functions are now part of the Fortran 90 standard and require integer operands. The older FORTRAN 77 standard .AND., .OR., .NOT. for integer operands no longer applies under Fortran 90 because these operators are reserved for logical operands. Nevertheless, different compilers apply different extensions to the standard and two of the compilers discussed below compile the older FORTRAN 77 standard (with the default Fortran 90 options) without comment. The third compiler returns a "compilation failed" after it lists all instances of exceptions to the Fortran 90 standard. Of course, compilers that do accept extension to the ANSI/ISO standard will generally have a command line option to display exceptions to the standard. This option is not usually the default because the number of exception messages can become voluminous, especially when porting legacy code. In such cases acceptance of numerous extensions to the ANSI/ISO standard can be a great time-saver for the developer.

Table 1.2 shows results of CPU time for the KALLPC code compiled with four different Fortran compilers and executed in single thread mode on a dual processor Intel Pentium™ 200MHz workstation where the processors had no L2 cache. The KALLPC application is CPU intensive and performs a small amount of I/O only at the beginning and end of each run. Memory requirements are modest (~ 2 Mbytes) and for this reason on processors with at least 2Mbytes of cache the performance easily exceeds that of a Cray C90 mainframe. In the C90 case KALLC90 (the 64-bit version of KALLPC) runs in scalar mode because of a complex branching structure that inhibits vectorization. However, because of the small instruction set, instruction buffer fetch rates are amongst the smallest we have seen, and the Cray cft77 and f90 compilers both support the HPF POPCNT intrinsic function.

Both KALLPC (and KALLC90) perform no floating point work and therefore they do not test numerical performance. Both codes are integer, logical, and bit-wise operation intensive and represent a very special type of metric for compiler and implementation performance. Therefore the usual caution should be exercised in making generalizations. Numerical performance of these compilers will be the subject of separate reports.

Table 1.2: CPU time in seconds for KALLPC on a 200MHz dual processor Pentium™ workstation [1] under Windows NT™ 4.0 with four different Fortran compilers (see notes to table for details).
Matrix size N Absoft F77, v6.0 [2] Absoft F90, v6.0 [2,5] DIGITAL Visual Fortran, v5.0 [3] NAGWare FTN90, v 2.18 [4]
30 3.5 2.1 2.3 6.0
44 704.1 347.8 480.3 647.0
48 116.9 54.9 78.9 211.0
52 435.7 194.6 286.1 764.0
56 3742.7 1681.4 2467.6 6509.0
60 ------- 117532.8 -------- --------
[1] This processor has no L2 cache and the performance will be better on processors with such L2 caches, especially because of the small size of the instruction and data sets.
[2] Absoft F77 and F90 are the Fortran compilers included in the Absoft Pro Fortran MP™ 6.0 package offered by the Absoft Corporation (http://www.absoft.com). The "generic" optimizations were used.
[3] DIGITAL Visual Fortran™  is a product and trademark of COMPAQ Digital Products and Services, (http://www.digital.com/fortran).
[4] NAG FTN90 is the NAGWare™ Salford PC Windows NT™ version of the Numerical Algorithms Group (NAG) Fortran 90 compiler (http://www.nag.com). This compiler accepts no exceptions to the ANSI/ISO Fortran 90 standard.
[5] Compare this result for N=60 with 55834.4 sec (cft77, v6.0.5.2), and 66148.6 sec (f90, v2.3.0.11) on the Cray C90™ in the 64 bit version KALLC90 using the HPF POPCNT intrinsic function.

backnext page

HiPERiSM Consulting, LLC, (919) 484-9803 (Voice)

(919) 806-2813 (Facsimile)