Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Design Application]
Optimizing Code, The SHARC Versus The Minnow (Part I): The Minnow's View
A classic choice is whether to add to legacy code, add a DSP CPU to the system, or go for an entirely new system.

Contributing Author  |   ED Online ID #4740  |   September 18, 2000


This article is the first of a two-part series. Part II is scheduled to appear in the Oct. 16 issue.—ED

Not long after arriving at a new job, most developers begin hoping that their new employers will switch them from the initial bug-fixing mode. Developers anticipate the fun and pleasure involved in being asked to add a significant amount of new functionality to an existing embedded system. Then, the standard dilemma arises. Adding to code in the current system will mean that legacy code and existing hardware layouts can be reused. This reduces development cost, and the revised system will pass through the door in short order. But will the new system have the required performance?

The cost side of embedded-system development means that a processor with classic CISC-processor characteristics powers many systems. The processor is nothing fancy. It's just something with about the capabilities of a 68K microcontroller, which is something that can comfortably handle the current requirements with a minimum of fuss and cost. Soon, an upgrade is needed!

I consulted on an upgrade to a handheld system initially designed to sample pressure and volume characteristics of a reciprocating gas compressor. The customer wanted to add a fast Fourier transform algorithm to provide immediate frequency information about unwanted vibrations within the compressor without waiting to analyze the data back at the base. As a consultant, I was asked if the new functionality could be added to the existing system, or if a co-processor/new system were necessary.

This two-part article examines the design decisions required when answering questions about placing DSP algorithms on an existing CISC processor system. A typical DSP algorithm with its loops, various memory accesses, additions, and multiplications is shown in Listing 1. This algorithm calculates the instantaneous and average power associated with the complex-valued frequency components from some measurement.

This article addresses the kind of performance that can be obtained from a standard CISC processor with limited DSP capability. It also discusses whether or not there's a need to code directly in assembly code for maximum performance, or if an optimizing compiler will speed development. The second article will address what DSP CPU features offer the maximum performance advantage for different algorithms, how to code for speed—hand code or DSP compiler—and if techniques exist to really maximize the speed of today's highly pipelined processors.

The first consideration in adding new material to legacy code shouldn't be about speed, but rather, will the code work? Though not obvious, several design decisions have already been made which could mean that the DSP algorithm given in Listing 1 won't necessarily work. Even the design decision to use 16-bit precision for 8-bit data values may not be sound. One major characteristic of most DSP algorithms is a loop, along with multiple memory accesses and repeated multiplications and additions. It doesn't take a very large loop before operations on 8-bit values might overflow the 16-bit number representation. The two key DSP-related problems are in the words might overflow and the not very obvious fact that "C doesn't care."

Take, for example, a simple DSP algorithm designed to provide the average of a long sequence of 8-bit values. The addition of a very long series of alternating large positive (0x7F) and large negative (0x80) values from a high-frequency input would cause no problem. Consider, however, a situation where the data stream consists of the same values, but with a low-frequency signal so that the input comes in bursts of 512 similar values. The first 256 large, 8-bit negative values (0x80 in 8 bits stored as 0xFF80 in 16 bits) give the result of 0x8000. This result can be stored in 16 bits. Working with 512 negative values, though, requires the addition of the 16-bit values 0x8000 and 0x8000. The result can't be stored in 16 bits, but requires 17. The sum of the 512 input values is stored incorrectly as 0x0000 with an overflow flag set.

Some processors do have specific signed and and unsigned addition operations, although the 68K processor doesn't. The MIPS processor has ADDU (unsigned) and ADDS (signed), and the AMD 29K processors have ADD, ADDU, and ADDS. The ADDS instruction is designed to throw an exception when overflow occurs within a number representation. The exception handler must then correct the problem. For the averaging problem, the algorithm might be repeated with all of the input values scaled down by a factor of 2, and the final average scaled up to compensate.

Do you remember ever writing an arithmetic exception handler when developing "C" code? You probably didn't, as typical "C" compilers make use of the standard, nonexception-throwing ADD instruction. The typical MIPS compiler adds insult to injury by using the unsigned ADDU operation even when signed (short int) values are being handled. This means that the "C"-compiler-generated code simply ignores the overflow problem.

This hidden overflow defect is the reason why many standard "C" algorithms sometimes fail in an unpredictable fashion. It's particularly a problem with DSP algorithms. DSP processors have an architecture where there's a 50- to 80-bit accumulator register for use in conjunction with the multiplier to reduce overflow problems. But typically, there isn't a "C" syntax extension that lets the programmer specify access of this high-precision register.

To simplify this article, we shall assume that the input signals considered won't cause any overflows. Readers interested in a tutorial and techniques to minimize the overflow problem, and the related quantizing errors, while maximizing code speed can refer to "Are you damaging your data through a lack of bit cushions?" by M. Smith and L.E. Turner, to be published on the web at www.circuitcellar.com/.


<-- prev. page     [1] 2 3 4     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?


  • Network-On-Chip Tools Arrive for The Masses
  • Tackling System Design Challenges Through Early Verification
  • ESL Tools Take Center Stage As Designers Move Up
  • Parasitic Extraction Tool Targets Next-Generation Custom ICs
  • Synopsys Jumps Into ESL-Synthesis Pool
  • Verify Control Systems Before Committing To Hardware
  • You're Using How Many FPGAs?
  • Tool Up For The FPGA Blitz
    1) Build A Smart Battery Charger Using A Single-Transistor Circuit
    (177 views today)
    2) Hot Hands For Some Cool Rock: Motion Sensing Meets Audio Engineering
    (168 views today)
    3) What's All This Transimpedance Amplifier Stuff, Anyhow? (Part 1)
    (72 views today)
    4) GPS-Derived Grandmaster Clock Delivers Ultra-Precise Time And Frequency Sync
    (72 views today)
    5) Downconverting Mixers Lower Power Consumption While Improving Performance
    (56 views today)
    ALL TOP 20



    POST YOUR COMMENTS HERE
    Name:

    Email:
    Your Comments:

    Enter the text from the image below


    Please refresh the page if you have trouble reading this text.

    Search Electronic Design
         
      
     
    Web Seminar
    Sponsored By:
    Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
    Speakers: 
    Date: 07/01/08
    Register: 

    Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
    Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources