HiPERiSM - High Performance Algorism Consulting
What we found out when we tested products with applications - see the summary below, and a collection of downloadable PDF files.
Our benchmarks compare compilers on ia32 and ia64 architectures.
HiPERiSM Consulting issues ad-hoc technical reports on selected products and problems in multiprocessor computing. These technical reports are available in electronic form at this site and are copyright by HiPERiSM Consulting, LLC. All trade names mentioned are the property of the owners and opinions expressed here are not necessarily shared by them. HiPERiSM Consulting, LLC, will not be liable for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of the products or source code discussed in these reports.
These reports will compare performance of compilers with workstations as targets. The comparison will usually be for the Microsoft Windows™ and Linux operating systems (with occasional proprietary vendor platforms). The aim is not to subscribe to a "winner-takes-all" approach but rather to learn how different compilers behave and what features they offer applications developers. Our focus is with a select group of benchmarks that we are familiar with on multiple platforms. An extensive performance analysis on ia32 and ia64 platforms for serial and parallel version of CMAQ has been completed (under contract to the U.S. EPA) and is also presented here. GPGPU benchmarks for CMAQ loop nests are also included.
Each compiler studied in these reports has been completely tested and debugged by the respective vendors, and each is considered to be an "industrial strength" product. The results of these reports do not in anyway imply defectiveness in the products discussed, on the contrary, each product has been chosen for study because it is generally acknowledged to have outstanding features or performance. Where technical support staff of the respective vendors have made suggestions, or otherwise responded to the results reported here, this is acknowledged where appropriate.
1) Kallman is an integer logical algorithm with a small instruction and data set that resides entirely in cache and produces negligible memory traffic and is suitable to test the limits of CPU speed.
2) SOM is the Stommel Ocean Model where the compute Kernel is a Jacobi iteration that sweeps over a two-dimensional grid and the loop structure is excellent for testing compiler optimizations and problem size scalability.
3) POM is the Princeton Ocean Model. This is an example of a "real world" model that has over five hundred vectorizable loops. This version of POM was developed to produce good scalability for vector register architectures and is suitable to test how well compilers can optimize for cache based architectures.
4) STREAM is the benchmark for Sustainable Memory Bandwidth in High Performance Computers (http://www.cs.virginia.edu/stream) and is used here to test memory bandwidth differences between compilers on commodity hardware with dual processor platforms. The OpenMP version is used to measure memory bandwdith loss as the threadcount is increased.
5) MM5 is the PSU/NCAR Mesoscale Modeling System (also known as MM5 Modeling System Version 3). This is an example of a "real world" model that has vectorizable loops and was developed to produce good scalability for vector register architectures and is suitable to test how well compilers can optimize for cache based architectures.
6) AERMOD is an Air Quality Model (AQM) in current use and describes pollutant dispersion and deposition. The source is characterized by negligible vector code, voluminous memory traffic with large rates of control transfer instructions such as branching logic, high procedure calling overhead, and I/O.
7) CAMx is an Air Quality Model (AQM) in current use and describes atmospheric chemistry. The source is characterized by negligible vector code, voluminous memory traffic with large rates of control transfer instructions such as branching logic, high procedure calling overhead, and voluminous I/O.
8) CMAQ Community Multiscale Air Quality model is an Air Quality Model (AQM) in current use and describes atmospheric chemistry (http://www.cmaq-model.org/). The source is characterized by some vector code (depending on the solver used), heavy memory traffic with, high procedure calling overhead, and voluminous I/O. A hybrid parallel version has been developed by HiPERiSM Consulting
9) Bandwidth is measured with the b_eff.c package ( https://fs.hlrs.de/projects/par/mpi/b_eff/)
Web sites that offer down-loadable files of benchmark suites are listed in the following table.
HiPERiSM Consulting, LLC, (919) 484-9803 (Voice)
(919) 806-2813 (Facsimile)