The NAS-LU code (sometimes referred to as APPLU or NPB-LU) is one of the NAS Parallel Benchmarks (NPB). The code is a lower-diagonal (LU) CFD application benchmark. However, it does not perform an LU factorization, but instead implements a symmetric successive over-relaxation (SSOR) numerical scheme to solve a regular-sparse, block lower and upper triangular system. The parallel code is generic and the parallel execution on a number of different platforms is achieved by transporting the code (via ftp) to the different platforms. Once transported, the code is compiled and linked with the appropriate version of CAPLib. The original serial code undergoes a relatively small number of changes to generate a parallel version of the code, this can be seen in the 1-D partition parallel code, the 1-D partition reduced memory parallel code , the 1-D partition overlapping communication parallel code and also for the 2-D partition parallel code.

Code information: 3300 lines of source and 17 subroutines

Total Parallelization Time using ParaWise : Approximately 45 minutes.

User Time: Approximately 10 minutes.

  Results

bullet

SGI Origin 2000

bullet

Cray T3D

bullet

Transtech Paramid

bullet

Parsys SN9500

bullet

Parsytec GC/PP

        SGI Origin 2000

ParaWise NPB2.3

1-D Partition

ParaWise NPB2.3

2-D Partition

NASA Manual NPB2.2

2-D Partition

Processors Speed Up Processors Speed Up Processors Speed Up
2 2.04 2 2.14 2 2.19
4 4.05 4 4.41 4 5.06
8 8.65 8 8.99 8 10.97
16 17.10 16 22.37 16 23.56
32 26.26 32 37.93 32 44.77

Results for NAS-LU (Ver. NPB2.3) on a 64x64x64 (Class A) grid on the SGI Origin 2000 for ParaWise one- dimensional and two-dimensional partitioning, and NASA manual two-dimensional parallelization (NPB2.2).

 

ParaWise NPB2.3

1-D Partition

ParaWise NPB2.3

2-D Partition

NASA Manual NPB2.2

2-D Partition

Processors Speed Up Processors Speed Up Processors Speed Up
2 2.00 2 2.00 2 2.00
4 4.22 4 4.03 4 5.07
8 8.32 8 8.56 8 11.02
16 14.59 16 16.57 16 19.00
32 25.67 32 37.65 32 36.28

Results for NAS-LU (Ver. NPB2.3) on a 102x102x102 (Class B) grid on the SGI Origin for ParaWise one-dimensional and two-dimensional partitioning, and NASA manual two-dimensional parallelization (NPB2.2).

        Cray T3D

ParaWise NPB2.3

1-D Partition (64x64x64)

ParaWise NPB2.3

2-D Partition (64x64x64)

Processors Time(secs) Speed Up Processors Time(secs) Speed Up
1 10059 * - 1 10059 * -
8 1446 6.96 64(8x8) 186.9 53.82
16 754.41 13.33 128(8x16) 109.63 91.75
32 404.98 24.84 256(16x16) 65.17 154.35
64 230.50 43.63      

* lower bound estimate for serial time (i.e. speed up is also lower bound)

 

ParaWise NPB2.3

1-D Partition (32x32x32)

Processors Time(secs) Speed Up

1

227

-

4 65.17 3.48
8 34.23 6.63
16 18.52 12.25

32

10.56 21.50

Speed Up Graph for NAS-LU for a 32x32x32 and 64x64x64 problems on the Cray T3D.

       Transtech Paramid

1-D Partition (32x32x32)

Processors

Synchronous Speed Up

Overlapping calc and comm

Speed Up

1 - -
2 1.76 1.87
4 2.98 3.54
8 4.63 6.37
12 5.41 7.94
16 6.40 10.00

      Speed Up Graph for NAS-LU for a 32x32x32 problem on the Transtech Paramid.

      Timing Graph for NAS-LU for a 32x32x32 problem on the Transtech Paramid.

        Parsys SN9500

1-D Partition (24x24x24)

Processors

Synchronous Speed Up

Overlapping calc and comm

Speed Up

1 - -
2 1.83 1.83
3 2.69 2.69
4 3.51 3.54
5 4.18 4.23
6 5.07 5.12

Speed Up Graph for NAS-LU for a 24x24x24 problem on the Parsys SN9500.

Timing Graph for NAS-LU for a 24x24x24 problem on the Parsys SN9500.

        Parsytec GC/PP

1-D Partition (64x64x64)

Processors

Time(secs)

Speed Up

1 9367.2* -
8 1301.4 7.1
10 1162.6 8.1
11 1050.6 8.9
13 942.6 9.9
16 815.7 11.5
22 697.2 13.4
32 584.9 16.0

     * lower bound estimate for serial time (i.e. speed up is also lower bound)

1-D Partition (32x32x32)

Processors

Time(secs)

Speed Up

1 174.1 -
2 95.4 1.8
4 59.3 2.9
8 38.4 4.5
16 29.2 6.0

    Speed Up Graph for NAS-LU for 32x32x32 and 64x64x64 problem sizes on the Parsytec GC/PP