The NAS-SP code (sometimes referred to as APPSP or NPB-SP) is one of the NAS Parallel Benchmarks (NPB). The code uses an implicit algorithm to compute a finite difference solution to the 3D compressible Navier-Stokes equations. The solution is based on a Beam-Warming approximate factorization. The approximate factorization decouples the three dimensions. This leads to three sets of regularly structured systems of linear equations. The resulting equations are scalar penta-diagonal which are solved using the Thomas algorithm (Gaussian elimination) without pivoting of a banded system. The time taken to parallelize NAS-SP was notably faster than that of NAS-LU.

Code information: 3500 lines of source and 25 subroutines

Total Parallelization Time using ParaWise : Approximately 35 minutes.

User Time: Approximately 10 minutes.

  Results

bullet

SGI Origin 2000

bullet

Cray T3D

bullet

IBM SP2

bullet

Transtech Paramid

  SGI Origin 2000

ParaWise NPB2.3

1-D Partition

ParaWise NPB2.3

2-D Partition

NASA Manual NPB2.2

2-D Partition

Processors Speed Up Processors Speed Up Processors Speed Up
4 3.10 4 3.60 4 3.97
9 8.72 9 7.75 9 10.61
16 12.72 16 13.02 16 18.50
25 14.14 25 19.07 25 29.61
    36 21.36 36 28.84
    49 31.13 49 37.41

Results for NAS-SP (Ver. NPB2.3) on a 64x64x64 (Class A) grid on the SGI Origin for ParaWise one-dimensional and two-dimensional partitioning, and NASA manual two-dimensional parallelization (NPB2.2).

  Cray T3D

ParaWise NPB4.3

2-D Partition (64x64x64)

Processors Time(secs) Speed Up

16(4x4)

227

-

32(4x8) 65.17 3.48
64(8x8) 34.23 6.63

Results for NAS-SP (Rev. 4.3) on a 64x64x64 (Class A) grid on the Cray T3D for a ParaWise two-dimensional partitioning.

IBM SP2

ParaWise NPB2.3

2-D Partition (64x64x64)

ParaWise NPB4.3

2-D Partition (64x64x64)

Processors Speed Up Processors Speed Up
4(2x2) 3.6 4(2x2) 3.9
16(4x4) 11.3 16(4x4) 13.4

Results for NAS-SP (Rev. 4.3 and Ver. NPB2.3) on a 64x64x64 (Class A) grid on the IBM SP2 for a ParaWise two-dimensional partitioning.

Transtech Paramid

1-D Partition (32x32x32)

Processors

Synchronous Speed Up

Overlapping calc and comm

Speed Up

1 - -
2 1.61 1.73
4 2.18 2.83
8 2.81 4.22
12 2.99 4.75
16 3.19 5.30