Parallel Software Products




Section of a 2D Structured Mesh Partition onto a 4x4 Grid of Processors.
Structured Mesh Partition



An Unstructured Mesh Partition onto 3 Processors
Unstructured Mesh Partition





Flowchart for ParaWise Message Passing Parallelisation
Message Passing Flowchart



Sample ParaWise Message Passing Speedup
ParaWise messing passing results on 256 processors
                for NAS-BT


Video: Using ParaWise for message passing parallelization

Parallel Message Passing source code generated by ParaWise is portable to any system using efficient machine specific communications or MPI.





Parallel Message Passing source code that can effectively exploit 10's or 100's of processors.




ParaWise Version 4.0

Download a free trial version of ParaWise-4.0 now for the Message Passing and OpenMP parallelization of Fortran code and now the OpenMP parallelization of C code.








The ParaWise Automatic Parallelization Environment


ParaWise - the Computer Aided Parallelization Environment is a semi-automatic parallelization tool. ParaWise has been developed primarily to aid in the parallelization of existing serial FORTRAN 77, FORTRAN 90 or FORTRAN 95 Computational Mechanics software. ParaWise analyzes the serial code, and with user interaction, generates either a parallel code containing Message Passing library calls or OpenMP Shared Memory directives.

Message Passing Parallel Code Generation For Multi Processor Systems

ParaWise simplifies the process of message passing programming by generating parallel FORTRAN 77, FORTRAN 90 or FORTRAN 95 code very similar to the original serial code but with communication calls inserted and modifications to the code that will allow it to efficiently run on parallel computing systems from supercomputers to multicore processors, exploiting the concurrency detected by the dependence analysis. The message passing code generated by ParaWise can be executed on any distributed or shared memory system that supports inter processor communication.

Two data partition strategies are available for distributing data and the associated computations across a processor topology, structured mesh partitioning and unstructured mesh partitioning (as can be required for Finite Element codes etc.).

A two dimensional data partition of a structured mesh is shown to the left for part of a 4x4 grid of processors. The "CAP_" variables specify the range of data owned and assigned on each processor.The ParaWise parallelisation aims to keep the vast majority of communications localised between neigbouring processors, as shown by the pale grey overlap areas shown in the diagram of data used by a processor by owned by a neighbour. The grey sections and the arrows show a communication with neighbouring processors where, for example, processor 5 can access the values owned by processor 6 and also the value shown in dark grey owned by processor 2 (if the computation on processor 5 requires it).

The second diagram on the left shows a small unstructured mesh (such as a finite element mesh) partitioned between 3 processors. The triangle in solid lines are owned on the respective processor and those shown with dotted edges are the overlap elements on each processor owned by a neigbour processor. Similarly, the solid circles are nodes that are owned by the processor and the empty circles are overlap nodes owned by neighbour processors. The interface between the application code d>ata structures and the CAPLib communication utilities is performed by inspector loops added to the parallel code. These inspector loops mimic the relevant loops in the application code where data on other processors is required and build lists of required data with the CAPLib utilities. This is used, for example, to automatically compute the data partition and to transmit all data owned by one processor but needed on another in a single buffered communication.

The stages involved in both a Structured or Unstructured mesh parallelisation are:

Data Partitioning - Based on a single user selection of an index of an array in a routine, the ParaWise partition inheritance algorithm calculates a comprehensive set of arrays to be partitioned in all routines of the application code.
Execution Mask Addition - Determine statements, loops and routines etc. that need only be computed on a subset of processors (typically, just the processor that "owns" the array element being set in the data partition).
Communication Request Calcuation and Migration - Determine data used on a processor that may be owned by another processor in the data partition. The communication requests then migrate out of loops and into calling routines where many requests are usually placed a the same point in the application code.
Communication Request Merging - Communication requests placed at the same location are merged to reduce the number of communications required and avoid repeated communications of the same data.
Communication Generation - Merged communication requests are generated into the parallel source code as well as loops being distributed based on the earlier execution control masks.
Parallel Source Code Generation or Partition of an Additional Dimension - At this point, the parallel code can be generated or another dimension partition can be added to the parallelisation by repeating the process from the Data Partitioning phase.

Using the ParaWise browsers, information is provided at every stage to help the user understand what has been done. The browsers also allow the user to make modifications at any stage to improve the parallel code produced.

ParaWise Communications Browser

The ParaWise Communications Browser window for Message Passing

The communications browser from the ParaWise toolkit shown above lists the communications that have been generated by ParaWise. Selecting a communication then displays the list of program statements that requested the data transferred in the communication. In the case shown, 81 statements from a number of routines issued requests for the overlap/halo of array u, where all these requests migrated to an earlier point in the code execution and were then merged so that only a single communication was needed. Details about the execution control masks on the requesting statements and any data dependencies that caused the communication can be displayed and investigated to determine if an optimization is possible to avoid the need for the communication. The ParaWise Expert Assistant tool within ParaWise can be used to perform this optimization process, only needing to ask you simple questions about your program.

Running Message Passing Parallel Code Generated by ParaWise

The parallel code generated by ParaWise uses communication calls from the CAPlib communication library. The code can be compiled using a CAPlib utility and used either efficient machine specific interprocessor communication calls or the generic MPI library. To ensure portability to any system, we also provide the source code for a thin layer interface between CAPLib and MPI.

Results for Message Passing Parallelisations

You probably want to know if ParaWise works! Take a look at just some of the results that have been obtained by users of ParaWise.  There are examples of performance results for parallel code generated by ParaWise with message passing communication calls. There are also some results for a hybrid mix of Message Passing and OpenMP.

More Details

For more information on ParaWise go to Further Details.  

Documentation

If you want to read more about ParaWise then go to the Documentation pages.

Obtaining ParaWise

If after looking at our web page you decide you like what you see and would like to use ParaWise then go to our Downloads page.

Contact Us

If you have any further questions then please do contact us.