Parallel Software Products


OpenMP Parallelisation process in ParaWise/CAPO
ParaWise
              Openmp Schematic



Routine Cloning Code Snippet

OpenMP code snippet
   







Sample ParaWise OpenMP Speedup
ParaWise results with 64 OpenMP threads for Sea
                    code
   




Video: Using ParaWise for OpenMP parallelization


ParaWise Version 4.0

Download a free trial version of ParaWise-4.0 now for the Message Passing and OpenMP parallelization of Fortran code and now the OpenMP parallelization of C code.

The ParaWise Automatic Parallelization Environment

ParaWise - the Computer Aided Parallelization Toolkit (previously known as CAPTools) is an automatic parallelization tool. ParaWise has been developed primarily to aid in the parallelization of existing serial C, FORTRAN 77, FORTRAN 90 or FORTRAN 95 software. ParaWise analyzes the serial code and, with minimal user interaction, generates code that has been adapted to parallel form using either Message Passing library calls or  OpenMP Shared Memory directives.

OpenMP Parallel Code Generation For Multi Core and Multi Thread Systems

ParaWise has also been extended to simplify programming with OpenMP using the accurate dependence information to perform parallelisation for shared memory systems including multi core cpu's, changing the sequential code by the adding OpenMP directives. This enables a sequential program to be migrated to parallel systems, from Supercomputers to multicore processors, and exploit their full power effectively. The initial version was developed by NASA Ames (in collaboration with PSP) to generate Shared Memory directive code using OpenMP (see the ParaWise-based Automatic Parallelizer using OpenMP (CAPO) page). This has subsequently been significantly extended at PSP and now generation of OpenMP directives for C source code has also been incorporated.

The ParaWise OpenMP directives generation algorithm performs the basic functions of detecting concurrency in DO loops in Fortran (adding the !$OMP DO directive) and for loops in C (adding the #pragma omp for directive), along with determining and generating directives for those variables that must be private, shared, reduction etc. variables. Numerous techniques have been developed to assist in the conversion of serial code to become efficient and scalable parallel OpenMP code:

Interprocedural Operation - Loops containing call sites can be proven parallel and parallelism exploited. Parallel regions can also migrated into calling routines and merged if possible, allowing a parallel region to contain numerous parallel loops in numerous routines, thereby reducing the runtime overheads of initiating parallel regions.
Avoid Unnecessary Thread Synchronizations - Analysis of data accesses following a parallel loop detects if an expensive barrier synchronization is needed at the end of that loop, automatically adding a nowait clause if no barrier is required.
Pipeline Operations Detected and Exploited - Pipelines are transformed using OpenMP directives to take advantage of parallelism in recurence relations.
Extensive algorithms to Detect and Exploit Reduction Operations - Reduction operations are detected by examining data assignments in the relevant loop(s) and any routines called within those loops. Common practices, such as determination of the maximum value in an array along with the location in the array of that maximum are handled in the generated parallel code using OpenMP critical sections.
Detect and Exploit Threadprivate Data - Variables that can be threadprivate throughout an application code are detected and the individual variable or a Fortran common block handled via a threadprivate directive.
Transformation to Pass Private Data into Routines as Needed - Variables passed to called routines via common blocks in Fortran or via routine hosts in C or Fortran are automatically added to call argument lists to enable private copies to be used in the called routine.
Automatic Routine Cloning - When a routine is called in a parallel loop and in a parallel region but not in a parallel loop and outside of a parallel region, or if an argument is shared in one call but is private in another, several copies of that routine are automatically made if the directives required in each case differ.

 

Directives Browsers

The ParaWise Directives Browser window for OpenMP

The initially generated parallel version of the application code can then be run on a single thread and on multiple threads. Performance data can then be imported into ParaWise and used to provide speedup and wastage statistics for the loops in the application code. The wastage statistic is particularly useful in focusing any tuning as it indicates the loops where the most significant improvement in parallel performance can be achieved. The performance data also provides a measure of imbalance between the runtime of iterations of a parallel loop where, for example, a dynamic loop schedule may be advantageous.

The Directives browser indicates the causes of loops remaining serial in terms of your application code variables. This can often be that a variable is assigned in a loop iteration and potentially used in a later iteration (known as a true dependence), where determining that such a dependence does not exist can enable parallel execution. In some cases, the cause could be a variable that is used or set in a loop iteration that is potentially set in a later iteration (known as an anti or output dependence) where the variable is set prior to the loop or used after the loop. In this case, it could be that the variable is not reset in a later iteration, allowing it to be shared without inhibiting parallelism. Alternatively, it could be that the variable is workspace that is set and used in each iteration where any setting of the variable before the loop does not provide values used in the loop, and any usage after the loop does not use values set in the loop. This allows the variable to be private, thereby not inhibiting parallelism.  

Results for OpenMP

You probably want to know if ParaWise works! Take a look at just some of the results that have been obtained by users of ParaWise . There are examples of performance results for parallel code generated by ParaWise with C code with OpenMP directives and Fortran code with OpenMP directives. There are also some results for a hybrid mix of Message Passing and OpenMP.

More Details

For more information on ParaWise go to Further Details.  

Documentation

If you want to read more about ParaWise then go to the Documentation pages.

Obtaining ParaWise

If after looking at our web page you decide you like what you see and would like to use ParaWise then go to our Downloads page.

Contact Us

If you have any further questions then please do contact us.