Key To Graphs : NPB2.3 : NASA manual MPI parallelisation. directive : OpenMP directives inserted by hand. SGIpfa : Directives inserted by SGIpfa compiler. CAPMPI : ParaWise generated message passing. CAPdir : OpenMP directives inserted automatically by ParaWise (CAPO) NASLU
NASSP
NASBT
Dual QuadCore Intel Xeon

NAS Benchmarks LU, SP and BT
The NASLU code (sometimes referred to
as APPLU or NPBLU) is one of the NAS Parallel Benchmarks (NPB). The code is a lowerdiagonal
(LU) CFD application benchmark. However, it does not
perform a LU factorization, but instead implements a
symmetric successive overrelaxation (SSOR)
numerical scheme to solve a regularsparse, block
lower and upper triangular system. The code consists of
3300 lines of Fortran source in 17 subroutines.
Total Paralleliation Time
using ParaWise/CAPO
was
Approximately 30 minutes, of which user
interaction took around 10
minutes. The OpenMP
code contained 11 parallel region with 23
parallel loops (including 5 reduction loops)
along with 2 pipelines.
The results were provided by Dr
H. Jin at NASA Ames. The runtime graph to the
left for an SGi Origin shows comparisons
between manual and ParaWise/CAPO
parallelizations on upto 30 processors.
As can be seen in the graph, the
ParaWise generated OpenMP code outperforms the
manual OpenMP code, scaling well onto 30
processors. The ParaWise generated message
passing code and the manually produced NPB2.3
version both outpeform OpenMP for this
application code. NASSP The NASSP code
(sometimes referred to as APPSP or NPBSP) is
one of the NAS
Parallel Benchmarks (NPB). The code uses an
implicit algorithm to compute a finite
difference solution to the 3D compressible
NavierStokes equations. The solution is based
on a BeamWarming approximate factorization.
The approximate factorization decouples the
three dimensions. This leads to three sets of
regularly structured systems of linear
equations. The resulting equations are scalar
pentadiagonal which are solved using the
Thomas algorithm (Gaussian elimination)
without pivoting of a banded system. The
code consists of 3500 lines of Fortran
source in 25 subroutines. The total
parallelisation time is approximately 20
minutes where the user interaction takes
are 10 minutes. The generated OpenMP code
contained 17 parallel region with 78
parallel loops.
NASBT The NASBT code (sometimes
referred to as APPBT or NPBBT) is one of the NAS Parallel Benchmarks (NPB). The code uses an
implicit algorithm to compute a finite
difference solution to the 3D compressible
NavierStokes equations. The solution is based
on a BeamWarming approximate factorisation. The
approximate factorisation decouples the three
dimensions. This leads to three sets of
regularly structured systems of linear
equations. The resulting equations are block
tridiagonal which are solved using the Thomas
algorithm (Gaussian elimination) without
pivoting of a banded system. Code
information: The code consists of 4500 lines of
Fortran source in 18 subroutines. The
automatically generated OpenMP version has 16
parallel regions containing 53 parallel loops.
The parallelisation was completed in around an
hour with approximately 10 minutes of user
interation.
The
SGI results were provided by Dr H. Jin
at NASA Ames. As with NASSP, both the
OpenMP code generated by ParaWise and
the message passing version generated
by ParaWi>se sc>ale well
exhibiting very similar scalability to
the manual OpenMP version and the
NPB2.3 MPI vers>ion. When run on
102 processors, the OpenMP
version generated by ParaWise
ran over 80 times faster than
the serial version. 