Premium Content

New Signal Chain Resources from Texas Instruments:

HPC And “Big Data” Apps Tap Floating-Point Number Compression

Advice on sharing big FP arrays among multiple cores.

Date Posted: January 09, 2012 11:39 AM
Author: Al Wegener

Text and integer numbers in the form of audio, speech, image, and video files have been the target of innumerable compression algorithms. Floating-point numbers, though, have drawn the proverbial short stick when it comes to compression research.

With the rise of high-performance computing (HPC) and so-called “big data” applications in seismology, physics, meteorology, and genomics, floating-point values are becoming more prevalent. Big data is the popular term for databases that hold many terabytes (1012 bytes) of data, often in numerical form. 

In HPC, big data is processed, searched, summarized, and visualized by thousands of microprocessor cores. Unlike many business and library text databases, HPC datasets contain numerical data—integer and floating-point values. The most common scientific datatype is the 32-bit floating-point number. 

With the explosion of mobile devices and ubiquitous sensing, companies and governments are collecting more real-world data than ever, including satellite tracking of illicit activity, climate metrics, astronomy, energy exploration, and drug discovery.

While sensor data is first captured in integer form as the output of an analog-to-digital converter, integer sensor data is commonly converted to floating-point form, simply because floats have a much wider dynamic range than ints and thus are easier to manipulate by computers.

Floating-Point Values

Floating-point values comprise three fields: a sign bit, some exponent bits, and significand or mantissa bits (see the figure). In the 1970s, microprocessor vendors developed proprietary floating-point formats that led to incompatibilities in data representation, causing the same scientific source code (typically written in FORTRAN) to generate different results on different processors.

The IEEE resolved this incompatibility issue in 1985 by ratifying the IEEE-754 standard, which defined a common floating-point format that most processor vendors implemented. The IEEE-754 standard specifies 32-bit floats with 1 sign bit, 8 exponent bits, and 23 mantissa bits.

Floating-point values in the IEEE-754 format are hard to compress because the mantissa bits follow a rather unusual statistical distribution called Benford’s law. In 1938 Frank Benford, a physicist at General Electric, noticed that the digits of logarithmic values were much more likely to begin with 1, 2, or 3, rather than 8 or 9. Benford’s law explains why floating-point mantissas (the bulk of floating-point bits) are hard to compress—primarily because they follow a broad, skewed distribution that exhibits no discernible patterns.

Floating-Point Compression

Peter Lindstrom and Martin Isenburg, who were working at Lawrence Livermore Labs in 2006, published a paper about the lossless compression of scientific floating-point values, including unstructured meshes, point sets, images, and voxel grids. Rather than aiming to achieve the highest lossless compression ratio, Lindstrom and Isenburg designed a software compression algorithm that would operate at the I/O rates of that time.

Their design goal was important, since HPC won’t generate results more quickly if the compression algorithm doesn’t operate at I/O rates. Their algorithm predicts each new floating-point number using a Lorenzo predictor and then entropy-encodes the difference between the predicted and actual values using an integer variant of arithmetic coding. The Lindstrom/Isenburg algorithm achieved an average lossless compression ratio of 1.5:1 at a rate of 20 Mbytes/s (5 Mfloats/s).

Improvements To Floating-Point Compression

Today, multicore chip designers at companies like Intel, Nvidia, IBM, and ARM are aware that their multicore designs are hitting the memory wall (see “The Memory Wall Is Ending Multicore Scaling” at electronicdesign.com). Memory, bus, and disk bandwidth limitations significantly reduce the benefits of multiple compute cores.

If floating-point compression and decompression is to keep up with today’s Gbyte/s I/O rates, compression algorithms that reduce multicore I/O bottlenecks will have to be significantly accelerated in software or implemented in hardware.  

If hardware acceleration were to provide compress-decompress functions, the compress-decompress block would ideally accept both floating-point and integer values and would support fast lossless compression, as well as lossy compression options where users specify the desired compression ratio or the decompressed data quality. With these improvements, numerical compression can flexibly accelerate I/O rates that degrade the throughput of many multicore applications.

Al Wegener | Benford's Law | Compression | floating-point | High Performance Computing | HPC | IEEE-754 | Lindstrom and Isenburg | lossless compression | multi-core | Samplify
Part Inventory
Go
powered by:
 

 
You must log on before posting a comment.

Are you a new visitor? Register Here
  • Ben Myers
    3 months ago
    Feb 16, 2012

    Overall, an interesting and potentially useful article, but I have to question the assertion "In the 1970s, microprocessor vendors developed proprietary floating-point formats..." Who made microprocessors in the 1970s? Intel's 4004 was circa 1971, without floating point. The Intel 8087, companion to the 8086, may or may not have made its debut at the same time as the 8086 in 1978.

    Back in the '70s, great huge mainframes wandered the landscape, and yes, they had different instruction word lengths and floating point formats. I programmed a number of them, in assembly language, even. DEC's original minicomputers can hardly be called microprocessors either.

    So who made a MICROprocessor with floating point in the 1970s?

    My other comment is that it is not a new phenomenon that multi-core CPUs are bottlenecked by memory, bus and disk bandwidth. I did a lot of work on the GE (later Honeywell) 600 and 6000 systems, which had standard configurations with up to four processors (each in cabinets!). Hardware monitors measured CPU utilization (busy doing real work) for each processor and the percentage utilization decreased steadily from 100% for the first on downward for each processor. Nevertheless, GE Nuclear ordered a 6-processor system to do its nuclear computations. It was the only six processor Honeywell 6080 ever built. And with core memory, too, because core memory had a faster cycle time than the newer CMOS memory.

    "Those who cannot remember the past are condemned to repeat it" - Santayana