NAS Feature Story: Raising the Parallel Bar: New Tool Speeds NASA CodesRaising the Parallel Bar: New Tool Speeds NASA Codes

NAS Division researchers have raised the bar for parallel processing software by developing CAPO, an automated tool that helps speed up and simplify the tedious process of parallelizing NASA’s large serial codes.

For most of us, cloud watching is a good way to decide whether to grab an umbrella, or to imagine birds, whales, and elephants in the sky. For NASA researchers, studying clouds helps shed light on phenomena such as air-sea interactions and global climate changes. To create better cloud models, researchers at NASA Goddard Space Flight Center, Greenbelt, Md., are using a computer code optimized by NAS Division researchers Henry Jin and Gabriele Jost.

Jin and Jost have worked with Goddard scientists Wei-Kuo Tao, Dan Johnson, and Chung-Lin Shie to improve the three-dimensional Goddard Cumulus Ensemble (GCEM3D) Code. To do this, the NAS researchers applied their tool, called Computer-Aided Parallelization and Optimizer (CAPO), to the GCEM3D code. CAPO is designed to take advantage of shared-memory parallel computers, and automates the labor-intensive steps of parallelizing serial code. "I think the CAPO tool is very useful, especially for model runs demanding lots of processors and massive amounts of memory," says Shie.

Parallelizing the code enabled the Goddard scientists to run larger cloud simulations much faster than before. "The idea is, that if it takes x hours to run a simulation on a single processor it will only take a fraction of x to run it on multiple processors," explains Jost.

Steps to Parallelization, page 2

page twopage 3

NAS Feature Story: Raising the Parallel Bar: New Tool Speeds NASA Codes, p. 2Raising the Parallel Bar

Steps to Parallelization

Thumbnail of flow chart showing CAPO prcess
CAPO process flow chart. (Click to enlarge)

CAPO goes through several steps to parallelize a serial code (see figure at right). First, the tool analyzes data dependencies to determine how different variables depend on one another. Then, a loop level analysis (looking for repeated sequences of instructions in the code) is done. Users are then taken through a series of graphical user interfaces that illustrate the parallelization process. "The tool takes away the tedious and error-prone work of parallelizing code, allowing the user to focus on optimization of critical parts of the code, all through a single interface," explains Jost.

Dependency analysis, the core element of the CAPO software, determines the relationships between variables within a serial code. This process is usually time consuming because of the complex structure of large codes containing many subroutines.

Following the dependence analysis, the loops within the code are examined. "CAPO examines the loops for potential data dependencies that might prohibit parallelization. If you have a loop which iterates over some repeated sections of the code, you can actually break this loop into individual pieces, such that you can run them concurrently on the processors," explains Jin. "That’s how you get the speed-up in code performance."

Thumbnail diagram of loop iteration timelines, using CAPO
CAPO improves the level of parallelization. (Click to enlarge)

Once the loop level analysis is complete, users are guided through a series of built-in graphical user interfaces. These interfaces enable users to view all instances where the code did not parallelize. The more obstacles a user is able to remove, the higher level of parallelization that can be achieved (see figure at left)."The user guides the tool, but the tool also guides the user—it’s an interaction," explains Jost.

Proof is in the Numbers, page 3

page 1 page 3

NAS Feature Story: Raising the Parallel Bar, New Tool Speeds NASA Codes, p.3Raising the Parallel Bar

Proof is in the Numbers

Thumbnail of CAPO performance comparison
CAPO performance comparison. (Click to enlarge)

Before applying CAPO to GCEM3D, the cloud modeling code was only able to run very small cases, scaling up to four processors on a PC. After applying CAPO and making some adjustments, Jin and Jost achieved a factor of 12 speed-up, when running a test case on 16 processors of an SGI Origin 3000 (see figure, right). Using larger cases, the code scaled up to 64 CPUs.

The newly optimized GCEM3D code enabled Goddard researchers to increase the resolution of their case studies. They successfully ran a large test case of 1,026-by-1,026-by-34, using more than seven gigabytes of memory—a new feat using this application. "Our goal of cloud modeling not only aims to better understand the microphysical and dynamical processes of the cloud system itself, but also to improve their representation for large-scale applications, such as studies on the precipitating convective system, air-sea interactions and cloud-aerosol interactions, as well as the global change in climate and hydrology," explains Shie. CAPO enables NASA Goddard scientists to achieve these research goals much faster.

After the success with their cloud modeling code, researchers at Goddard are now interested in applying the CAPO tool to other serial codes. "I will apply CAPO to other codes in the future because of the substantial improvement in model performance due to computational efficiency and memory extension," says Shie.

Tao and Shie visited NASA Ames in September 2002 to learn more about CAPO. And Jin and Jost visited Goddard to demonstrate the tool to a group of researchers. The CAPO team is aiming is to transfer knowledge of the tool so that individual users can apply the tool to their codes. For decades, NASA scientists have been generating serial codes which at this point need to be parallelized. Thanks to CAPO, those codes can now run more efficiently on the agency’s shared-memory parallel systems like those at the NAS Facility.


CAPO was first developed in 1998, and inspired by a collaboration with the creators of CAPTools at the University of Greenwich in the United Kingdom. CAPTools (now marketed as ParaWise) is software that generates code to run on distributed-memory machines.

Look for the full story, "Parallelization—the Key to Faster Codes, Higher Fidelity Simulations," by NAS staff writer Holly A. Amundson, in the Winter 2003 issue of Gridpoints.

page 1 page 2