Multicore computing results are increasingly determined by the rate at which operands are fed to the cores and decreasingly by the speed or the number of cores. SDRAM and flash memory bottlenecks often slow down computations, but disk-drive speeds can also determine the rate at which supercomputers deliver results.

Indeed, in high-performance computing (HPC) applications, “time to results” can be directly related to the rate at which numerical datasets, such as climate measurements, are read from disk, as well as the rate at which simulation results, such as 3D physics arrays, are stored to disk drives.

Download this article in .PDF format
This file type includes high resolution graphics and schematics.

Titanic Computing

Today’s supercomputers have immense computing power. Supercomputers are ranked according to a “flops” metric that’s calculated using the rather dated Linpack benchmark.1 Every six months, www.top500.orgupdates the list of the most powerful supercomputers. In November 2012, the fastest supercomputer in the world was the Titan supercomputer at Oak Ridge National Labs in Tennessee, clocking in at 17.6 Pflops (17.6 x 1018 floating-point operations per second).

Titan uses Intel x86 processors and Nvidia GPUs to achieve this prodigious calculation rate. Unfortunately, a supercomputer’s Linpack rating rarely predicts the performance of everyday “codes,” as the supercomputing programmers and users call their application-specific software.

Two application codes that require long calculation times and generate and consume multi-petabyte (1018 bytes) datasets are climate and physics simulations. It’s not uncommon for climate and physics simulations to run for many months using hundreds of thousands of cores and to consume terabytes of RAM and petabytes of disk storage. The supercomputing community is actively evaluating various techniques, including compression, to reduce RAM, flash, and disk bottlenecks in supercomputing.

Climate simulations have been in the news since the late 1990s because they improve society’s understanding of climate change. Climate scientists typically begin a simulation at a historical point where atmospheric, oceanic, and land temperatures were once measured. These simulations combine multiple abstracted, mathematical 3D models of the Earth’s air, ocean, land, and ice conditions and advance the simulation one timestep at a time.

Depending on the purpose of the climate simulation, timestep sizes may range from minutes (weather forecasts having a daily or weekly duration) to months (climate simulations having 100-year durations). The output of every Nth timestep is saved for post-simulation analysis and visualization, where N may range from 1 to 100.

The beautiful, complex color weather maps we see on television and on the Internet are the outcome of such climate simulations, as are the decadal distribution and density maps of carbon dioxide, methane, water, clouds, and ice that are consistent with global warming.

Scientists are improving climate models in two aspects: improved grid density and more complex boundary interactions. Since atmospheric and oceanic grid densities have both spatial (x-y-z) and temporal (t) dimensions, improved grid density directly affects the size of intermediate timestep results.

A twofold increase per dimension in 3D spatial grid density creates intermediate datasets that are eight times larger, exacerbating the I/O challenges of climate simulations. Similarly, improving the accuracy of boundary interactions (such as between the air and ocean, or between ice and water) often requires higher grid densities at the boundaries. More complex models result in longer climate simulation times and more data to be captured and visualized.

In The Labs

One of Samplify’s climate simulation collaborators participated in the latest Coupled Model Intercomparison Project Phase 5 (CMIP52) long-term climate simulation. This group struggled to analyze the climate results it generated for CMIP5, which were 10 times larger than its CMIP4 datasets five years earlier. This group’s problem occurred not in the generation of climate simulation data, but rather in the post-simulation data analysis and visualization phase, where slow disk reads threatened the group’s timely delivery of well-studied CMIP5 results.

Lawrence Livermore National Labs (LLNL) operates a special research building called the National Ignition Facility3 (NIF) that evaluates next-generation fusion energy. At NIF, the world’s most powerful laser (actually a complex of 192 high-energy lasers) impinges on a tiny, gold-plated cylinder the size of a pencil eraser called a “hohlraum” (the German word for “hollow space”) in which the fusion reaction occurs.

Because of the immense amounts of power and months of preparation required for each NIF experiment, it is significantly less expensive for LLNL to perform multiple hohlraum physics simulations in a supercomputer than it is to perform a single NIF experiment. LLNL’s dedicated NIF physicists perform considerably more NIF simulations than experiments, but the complexity of laser-plasma interactions within the hohlraum requires the interaction of several physics simulations.

As with climate simulations, grid density and timestep size are key NIF simulation parameters, and LLNL physicists always want smaller grid and timestep spacing because better resolution provides additional physics insights. Simulating NIF’s multi-physics interactions generates petabyte-scale datasets over months-long simulations using LLNL’s Sequoia, the world’s second-fastest supercomputer.

Samplify collaborated both with Deutsches Klimarechenzentrum (DKRZ; German Climate Computing Centre) and LLNL in experiments that demonstrate that lossy compression of climate and physics datasets maintains the simulation results while reducing the memory and disk I/O bottlenecks by factors of four and six. The table summarizes the results of compressing climate4 and physics5 datasets using the APAX universal numerical encoder. These two papers are the first to document that compression’s effects maintain simulation results for large, complex HPC calculations while significantly reducing memory and disk bottlenecks.


1. Frequently Asked Questions on the Linpack Benchmark and Top500,

2. CMIP – Coupled Model Intercomparison Project Phase 5 – Overview,

3. The National Ignition Facility: Ushering in a New Age for Science,

4. International Supercomputing Conference Presentation Details,

5. Submitted to

Al Wegener, CTO and founder of Samplify,,earned an MSCS from Stanford University and a BSEE from Bucknell University.

Download this article in .PDF format
This file type includes high resolution graphics and schematics.