Parallel Software Products     Please select sample OpenMP parallelisations of Fortran code performed by ParaWise from the menu below:

Key To Graphs :

NPB2.3 : NASA manual MPI parallelisation.

directive : OpenMP directives inserted by hand.

SGI-pfa : Directives inserted by SGI-pfa compiler.

CAP-MPI : ParaWise generated message passing.

CAP-dir : OpenMP directives inserted automatically by ParaWise (CAPO)


NAS-LU

LU
                  runtimes

NAS-SP

SP
                  runtimes


SP speedup

NAS-BT

BT
                  runtime

BT
                  speedup

Dual Quad-Core Intel Xeon
OpenMP results on Intel Quad-Core



NAS Benchmarks LU, SP and BT


NAS-LU

The NAS-LU code (sometimes referred to as APPLU or NPB-LU) is one of the NAS Parallel Benchmarks (NPB). The code is a lower-diagonal (LU) CFD application benchmark. However, it does not perform a LU factorization, but instead implements a symmetric successive over-relaxation (SSOR) numerical scheme to solve a regular-sparse, block lower and upper triangular system.

The code consists of 3300 lines of Fortran source in 17 subroutines. Total Paralleliation Time using ParaWise/CAPO was Approximately 30 minutes, of which user interaction took around 10 minutes. The OpenMP code contained 11 parallel region with 23 parallel loops (including 5 reduction loops) along with 2 pipelines.

The results were provided by Dr H. Jin at NASA Ames. The runtime graph to the left for an SGi Origin shows comparisons between manual and ParaWise/CAPO parallelizations on upto 30 processors.

As can be seen in the graph, the ParaWise generated OpenMP code outperforms the manual OpenMP code, scaling well onto 30 processors. The ParaWise generated message passing code and the manually produced NPB2.3 version both outpeform OpenMP for this application code.





NAS-SP

The NAS-SP code (sometimes referred to as APPSP or NPB-SP) is one of the NAS Parallel Benchmarks (NPB). The code uses an implicit algorithm to compute a finite difference solution to the 3D compressible Navier-Stokes equations. The solution is based on a Beam-Warming approximate factorization. The approximate factorization decouples the three dimensions. This leads to three sets of regularly structured systems of linear equations. The resulting equations are scalar penta-diagonal which are solved using the Thomas algorithm (Gaussian elimination) without pivoting of a banded system.

The code consists of 3500 lines of Fortran source in 25 subroutines. The total parallelisation time is approximately 20 minutes where the user interaction takes are 10 minutes. The generated OpenMP code contained 17 parallel region with 78 parallel loops.


The results were provided by Dr H. Jin at NASA Ames on an SGi Origin using upto 30 processors and an SGi Origin 3000 using upto 102 processors.


For the SP code, the performance of the ParaWIse generated OpenMP and message passing versions scaled well and comparibly with the manual OpenMP parallelisation and the NPB2.3 manual MPI parallelisation. The second graph shows that the OpenMP version automatically generated by ParaWise ran over 70 times faster than the serial code on 102 processors.









NAS-BT

The NAS-BT code (sometimes referred to as APPBT or NPB-BT) is one of the NAS Parallel Benchmarks (NPB). The code uses an implicit algorithm to compute a finite difference solution to the 3D compressible Navier-Stokes equations. The solution is based on a Beam-Warming approximate factorisation. The approximate factorisation decouples the three dimensions. This leads to three sets of regularly structured systems of linear equations. The resulting equations are block tridiagonal which are solved using the Thomas algorithm (Gaussian elimination) without pivoting of a banded system.

Code information: The code consists of 4500 lines of Fortran source in 18 subroutines. The automatically generated OpenMP version has 16 parallel regions containing 53 parallel loops. The parallelisation was completed in around an hour with approximately 10 minutes of user interation.

The SGI results were provided by Dr H. Jin at NASA Ames. As with NAS-SP, both the OpenMP code generated by ParaWise and the message passing version generated by ParaWi>se sc>ale well exhibiting very similar scalability to the manual OpenMP version and the NPB2.3 MPI vers>ion. When run on 102 processors, the OpenMP version generated by ParaWise ran over 80 times faster than the serial version.

Results on a dual quad-core Xeon processor also show good scalability onto 8 processor cores.