1.0 Combinatorial problems
1.1 The permanent of (0,1) matrices
Here attention is confined to the Kallman algorithm to
compute the permanent of a (0,1) matrix. A discussion of other alternatives, and why the
Kallman algorithm is preferred, is discussed by Delic and Cash (submitted to Computer
Physics Communications). The source code in question will be available for release after
this manuscript has been accepted for publication.
The permanent of a matrix, is a close relative to the determinant, but does not share the
latter's nice properties. There is a great deal of interest in the permanent in
combinatorial mathematics and in chemical graph theory where it is used to classify
chemical structures.
The problems with computing the permanent are that (a) the value can exceed the largest
integer number representable with the available machine word length, and (b) the time to
solution grows faster than exponential with matrix size. Both these features are a
challenge in designing an efficient and robust algorithm.
Kallman's algorithm relies on packing the rows of a (0,1) matrix in the respective bits of
a word and then performs group operations on theses words, e.g. with masking operations,
or population count. While the Intel™ based architectures discussed here are limited
to 32 bit words, by a simple device, it is possible to spread row sizes longer than 32
columns across mutliple words. The version of the Kallman algorithm (KALLPC) studied here
employs this method.
The largest proportion of the total CPU time is spent in the population count function to
find the number of nonzero bits in a word. There is a High Performance Fortran (HPF)
intrinsic integer function POPCNT to perform this operation, but while it is offered by
some mainframe vendors as an extension to the ASI/ISO standard under FORTRAN77 or Fortran
90, POPCNT is not available as an intrinsic integer function in any of the personal
computer compilers discussed in this report. Consequently, some method of counting the
number of nonzero bits has to be devised. There are several ways of doing this and the
MILSTD1753 bit intrinsic functions are available for this purpose. These were introduced
as an extension to the FORTRAN 77 standard and most are included with the Fortran 90
standard. The POPCNT function used in the KALLPC code discussed here is shown in Table
1.1.
Table 1.1: POPCNT function to return
number of nonzero bits in an integer word 
INTEGER FUNCTION POPCNT(MX)
COMMON/TWOS/ITWOS(30)
POPCNT=0
IF (MX .EQ. 0) RETURN
DO 1 I=1,30
1 IF (IAND(MX,ITWOS(I)) .NE. 0) POPCNT=POPCNT+1
RETURN
END
BLOCK DATA POPINIT
COMMON/TWOS/ITWOS(30)
DATA ITWOS/1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,
116384,32768,65536,131072,262144,524288,1048576,2097152,4194304,
28388608,16777216,33554432,67108864,134217728,268435456,536870912/
END

It is worthwhile remarking that the IAND, IOR, and NOT
intrinsic functions are now part of the Fortran 90 standard and require integer operands.
The older FORTRAN 77 standard .AND., .OR., .NOT. for integer operands no longer applies
under Fortran 90 because these operators are reserved for logical operands. Nevertheless,
different compilers apply different extensions to the standard and two of the compilers
discussed below compile the older FORTRAN 77 standard (with the default Fortran 90
options) without comment. The third compiler returns a "compilation failed"
after it lists all instances of exceptions to the Fortran 90 standard. Of course,
compilers that do accept extension to the ANSI/ISO standard will generally have a command
line option to display exceptions to the standard. This option is not usually the default
because the number of exception messages can become voluminous, especially when porting
legacy code. In such cases acceptance of numerous extensions to the ANSI/ISO standard can
be a great timesaver for the developer.
Table 1.2 shows results of CPU time for the KALLPC code compiled with four different
Fortran compilers and executed in single thread mode on a dual processor Intel
Pentium™ 200MHz workstation where the processors had no L2 cache. The KALLPC
application is CPU intensive and performs a small amount of I/O only at the beginning and
end of each run. Memory requirements are modest (~ 2 Mbytes) and for this reason on
processors with at least 2Mbytes of cache the performance easily exceeds that of a Cray
C90 mainframe. In the C90 case KALLC90 (the 64bit version of KALLPC) runs in scalar mode
because of a complex branching structure that inhibits vectorization. However, because of
the small instruction set, instruction buffer fetch rates are amongst the smallest we have
seen, and the Cray cft77 and f90 compilers both support the HPF POPCNT intrinsic function.
Both KALLPC (and KALLC90) perform no floating point work and therefore they do not test
numerical performance. Both codes are integer, logical, and bitwise operation intensive
and represent a very special type of metric for compiler and implementation performance.
Therefore the usual caution should be exercised in making generalizations. Numerical
performance of these compilers will be the subject of separate reports.
Table 1.2:
CPU time in seconds for KALLPC on a 200MHz dual processor Pentium™ workstation [1]
under Windows NT™ 4.0 with four different Fortran compilers (see notes to table for
details). 
Matrix size N 
Absoft F77, v6.0 [2] 
Absoft F90, v6.0 [2,5] 
DIGITAL Visual Fortran™,
v5.0 [3] 
NAGWare™
FTN90, v 2.18 [4] 
30 
3.5 
2.1 
2.3 
6.0 
44 
704.1 
347.8 
480.3 
647.0 
48 
116.9 
54.9 
78.9 
211.0 
52 
435.7 
194.6 
286.1 
764.0 
56 
3742.7 
1681.4 
2467.6 
6509.0 
60 
 
117532.8 
 
 
[1] This
processor has no L2 cache and the performance will be better on processors with such L2
caches, especially because of the small size of the instruction and data sets. 
[2]
Absoft F77 and F90 are the Fortran compilers included in the Absoft Pro Fortran MP™
6.0 package offered by the Absoft Corporation (http://www.absoft.com). The
"generic" optimizations were used. 
[3]
DIGITAL Visual Fortran™ is a product and trademark of COMPAQ Digital Products
and Services, (http://www.digital.com/fortran). 
[4] NAG
FTN90 is the NAGWare™ Salford PC Windows NT™ version of the Numerical Algorithms
Group (NAG) Fortran 90 compiler (http://www.nag.com).
This compiler accepts no exceptions to the ANSI/ISO Fortran 90 standard. 
[5]
Compare this result for N=60 with 55834.4 sec (cft77, v6.0.5.2), and 66148.6 sec (f90,
v2.3.0.11) on the Cray C90™ in the 64 bit version KALLC90 using the HPF POPCNT
intrinsic function. 
HiPERiSM Consulting, LLC, (919) 4849803
(Voice)
(919) 8062813 (Facsimile) 