Parallel Software Products        Please select sample Message Passing parallelisations of Fortran code performed by ParaWise from the menu below:

APPLU Class A (64x64x64) on SGi Origin 2000
applu A S2K

APPLU Class B (102x102x102) on SGi Origin 2000
applu B O2K

APPLU Class A (64x64x64) on Cray T3D
t3d speedup


APPBT Class A (64x64x64) on SGi Origin 2000
bt 02K

APPBT Class A (64x64x64) on Cray T3D
btT3D

APPSP Class A (64x64x64) on SGi Origin 2000
sp o2k speedups

NAS LU, BT and SP Benchmarks

NAS LU Code

The NAS-LU code (sometimes referred to as APPLU or NPB-LU) is one of the NAS Parallel Benchmarks (NPB). The NAS-LU code is a lower-diagonal (LU) CFD application. However, it does not perform an LU factorization, but instead implements a symmetric successive over-relaxation (SSOR) numerical scheme to solve a regular-sparse, block lower and upper triangular system. The parallel code is generic and the parallel execution on a number of different platforms is achieved by transporting the code to the different platforms. Once transported, the code is compiled and linked with the appropriate version of CAPLib. The original serial code undergoes a relatively small number of changes to generate a parallel version of the code, this can be seen in the 1-D partition parallel code, the 1-D partition reduced memory parallel code , the 1-D partition overlapping communication parallel code and also for the 2-D partition parallel code.

Code information: 3300 lines of source and 17 subroutines

Total Parallelization Time using ParaWise : Approximately 45 minutes.

User Time: Approximately 10 minutes.

The graphs to the left show the LU benchmark ran on an SGi Origin 2000 and a Cray T3D.

On the Origin 2000, good scalabiltiy is acheived on 32 processors with more scalable 2D partitioned version showing performance comparible with that of the version created manually by the experts at NASA Ames.

On the Cray T3D, good scalability on 256 processors is demonstrated with a runtime 154 times faster than the serial runtime for the NPB2.3 LU code. This execution used CAPLib communications linked to the Cray SHMEM library.

NAS BT Code.

The NAS-BT code (sometimes referred to as APPBT or NPB-BT) is one of the NAS Parallel Benchmarks (NPB). The code uses an implicit algorithm to compute a finite difference solution to the 3D compressible Navier-Stokes equations. The solution is based on a Beam-Warming approximate factorisation. The approximate factorisation decouples the three dimensions. This leads to three sets of regularly structured systems of linear equations. The resulting equations are block tridiagonal which are solved using the Thomas algorithm (Gaussian elimination) without pivoting of a banded system. 

Code information: 4500 lines of source and 18 subroutines

Total Parallelization Time using ParaWise : Approximately 40 minutes.

User Time: Approximately 10 minutes.


The NAS-BT message passing code generated by ParaWise scales well on the SGi Origin 2000 with the 2D partition version performing 34 times faster than the serial version on 64 processors. On the Cray T3D, the parawise generated code shows excellent scalability execution 215 times faster than the serial code on 256 processors.

NAS SP Code

The NAS-SP code (sometimes referred to as APPSP or NPB-SP) is one of the NAS ParallelBenchmarks (NPB). The code uses an implicit algorithm to compute a finite difference solution to the 3D compressible Navier-Stokes equations. The solution is based on a Beam-Warming approximate factorization. The approximate factorization decouples the three dimensions. This leads to three sets of regularly structured systems of linear equations. The resulting equations are scalar penta-diagonal which are solved using the Thomas algorithm (Gaussian elimination) without pivoting of a banded system. The time taken to parallelize NAS-SP was notably faster than that of NAS-LU.

Code information: 3500 lines of source and 25 subroutines

Total Parallelization Time using ParaWise : Approximately 35 minutes.

User Time: Approximately 10 minutes.


The NAS-SP message passing code generated by ParaWise scales well on the SGi Origin 2000 acheiving speedup that is only a little less than that acheived by the manual parallelisation by the experts at NASA Ames.