Compression algorithms for speech, audio, still images, and video are quite complicated and, more importantly, nearly always lossy. Thus, samples often change dramatically once they’re decompressed. Yet designers can craft a lossless compressor for samples from any analog-to-digital converter (ADC).

ADCs provide the same number of bits for all samples. For example, a 16-bit ADC always delivers 16 bits per sample. But that fixed number of bits may not always be required to represent the waveform captured by the ADC. The lossless compression algorithm outlined below represents ADC samples using a varying number of bits per sample.

In this design, the lossless compressor handles an integer stream of baseband ADC samples. Though simple, the algorithm is effective at compressing many baseband signals without loss. (The Matlab code for the compression algorithm is available with the online version of this article at www.electronicdesign.com.)

The Fundamentals

Compression algorithms contain two fundamental blocks: a redundancy remover that reduces the numerical range of each sample, and a bit packer that packs the remaining bits after the redundant bits are removed. A simple, effective, and lossless way to remove redundancies is with a series of sample-by-sample difference operations.

For instance, consider the 100 samples shown in the figure (blue curve). Taking the first and second derivative of the blue curve, the red and black curves, respectively, we can see that the black curve has the smallest signal excursion. Obviously, that’s the redundancy remover we want to use. The Matlab software examines the input signal stream and its first and second derivatives (which are implemented as simple integer subtractors) and chooses the result that requires the least amount of bits.

Bit packing is a lossless operation that packs together the output bits of the redundancy remover. While many bit-packing algorithms exist, the compressor here uses a technique called block floating-point (BFP) encoding. BFP generates one exponent per N redundancy-remover output samples.

For most ADC waveforms, N = 4 is most effective. (It’s also a power of two.) Each exponent indicates how many bits are required to encode the sample with the largest magnitude within each group of four samples. The scheme encodes all four of the group’s samples using that many bits

Importantly, BFP exponents are highly correlated. Therefore, instead of using four bits per exponent (assuming 16-bit input samples) to indicate the bits required for each sample in the group, our BFP bit packer encodes the exponent differences rather than the exponents themselves to achieve additional compression.

A Huffman code (detailed in the Matlab code listing) is then used to generate a 1-bit token for an exponent difference of 0, a 3-bit token for an exponent difference of +1 or –1, a 4-bit token for an exponent difference of +2 or –2, and an 8-bit token for all other exponent differences (up to ±16). Because small exponent differences are most common, this exponent encoding technique has a very low overhead of about two bits per four samples, or 0.5 bits per sample.

Implemented in a DSP chip or microprocessor, this simple compressor requires about 50 instructions per sample. However, lossless compression ratios fall between 1.3:1 and 2:1 on baseband signals. As with all lossless algorithms of this type, the amount of compression varies with the information content of the signal. The Matlab code also reports the algorithm’s total compression.

Al Wegener, CTO and founder of Samplify Systems, earned a BSEE from Bucknell University and an MSCS from Stanford University.