Premium Content

New Signal Chain Resources from Texas Instruments:

Optimizing Code, The SHARC Versus The Minnow (Part I): The Minnow's View

A classic choice is whether to add to legacy code, add a DSP CPU to the system, or go for an entirely new system.

Date Posted: September 18, 2000 12:00 AM

The optimizer recognized the common expression associated with multiple access to the same memory location. But it didn't recognize the other common expressions inside of the loop.

To be fair to this developer, the errors introduced into Listing 3 arose by a change in the development process associated with using unfamiliar index operations. The developer's standard process employs auto-incrementing addressing modes where offsets are automatically handled! Nevertheless, the experience still leads to the same object lesson. It makes sense to work with the compiler. Optimizing, or checking against, the output of the compiler offers a number of advantages to the tired, overworked developer.

It was revealed that writing "C" code using explicit array indexing, and turning on the SDS compiler optimizer, doesn't generate assembly language instructions using the faster indirect auto-incrementing addressing mode (Listing 5). The "C" code must instead be rewritten to explicitly use pointer incrementing operations, before the compiler output uses the faster instructions (Listing 6). The automatic generation of the faster do-while form of the for loop, use of the faster auto-incrementing indirect addressing mode activated by writing "C" code using a specific process, and use of registers to common expressions activated by format of "C" code is demonstrated by Listing 7 from the SDS compiler.

Over the last few years, SDS has joined with Diab-Data, which has in the last couple of months begun operating under the banner of White River. Recently, I was given the opportunity to compare the SDS compiler with the Diab-Data 4.3f compiler.

The assembly code generated by the Diab-Data compiler for Listing 1 is very different from that generated by the SDS compiler. With the option, no optimizations activated, the code generated for every memory access has the form:

Move starting address into register
Move loop counter into register
Change loop counter into array offset
Add offset to register to form address
Access memory

The nonoptimized code generated makes use of the simplest instructions, and to say that it's grossly inefficient is being polite. Still, these low-level instructions are far easier to optimize than the more complex instructions that are generated by the SDS compiler.

With the optimizing option activated, the Diab-Data compiler will automatically generate code from Listing 1 to use the fast auto-incrementing instructions without the necessity of "persuasion" by rewriting the "C" code in the form of Listing 6 with explicit indexed addressing. The optimizer also placed the loop variable into a register.

The "C"-code format given to the Diab-Data compiler, though, still affects the speed of the optimized code. The optimizer switched to the faster "downcounting" loop when the "C" code used explicit indexed addressing rather than direct array indexing.

Was It Worth The Effort?
You, the reader, have probably spent 30 minutes reading this article, and perhaps it would take another couple of hours to become familiar enough to comfortably apply the hand-code optimizing techniques. If the new code is only a small part of the overall program code, is it worth spending the 200 minutes of your valuable development time? A comparison of the time it takes to execute a program on any processor can be obtained from the formula:

Execution time
= (instructions/program)

  • (average cycles/instruction)
  • (time/cycle)
  • Developing the optimized code involves minimizing the product of each of these ratios, not just minimizing one. The time/cycle ratio is a fixed fact for all listings executed on one processor. The total cycle time for a number of commonly used 68K instructions using four-cycle external memory is illustrated by Table 1. Faster access times are available when internal memory or caches are taken into account. But the overall DSP algorithm time may become undeterministic when cached values have to be written back to external memory. Choosing instructions involving fewer clock cycles per instruction might require additional instructions to be added, negating the speed gain.

    A comparison of the execution times for the loops in the listing is given in Table 2. In conclusion, if you modify the process of writing your "C" code using pointers to access memory, then on a basic 8-MHz CISC processor with characteristics of a 68K, the run time of the main body of the code is essentially the same whether the code is hand- or compiler-generated. The optimizing would become more worthwhile if a higher-speed hardware multiplier were available. In that case, the time to handle the loop and memory operations becomes more critical.

    Further hand-optimization of the SDS compiler-generated code of Listing 7 is possible with regard to loop overhead. The downcounting loop using registered variables scores over the compiler-generated code. If the loop body were larger, however, then the effect of the loop overhead would become insignificant. The difference between hand- and compiler-generated loops effectively disappears with the Diab-Data compiler.

    If your original embedded system needs the occasional nontime-critical DSP-code section, just go with the optimized compiler output, written with an improved format to activate the time- saving instructions. On the other hand, if your DSP algorithm must process the last block of data before the new block overwrites it, then each cycle counts, and knowing every last optimization technique is important.

    One technique to fine-tune your "68K optimization skills" is to take a look at the output from various optimizing compilers. For example, the Diab-Data compiler has three options: Standard Optimization, Aggressive Optimization, and Feedback Optimization.

    One of the optimizing options will perform multiple passes through the code to recognize further speed improvements made possible by earlier optimizations. This makes the tradeoff of a little increased compile time against future run-time savings.

    The Feedback Optimization option looks like it's worth determining whether it's just "hype" for your application, or if it produces something really useful. Paraphrasing the literature, the integrated run-time analysis tools make use of code profiling to "suggest" to the optimizer which of the various optimizing techniques for speeding the code is the most appropriate for "your" embedded application.

    In the next article, "The SHARC's Byte," we will look at this same DSP algorithm implemented on Analog Devices' ADSP-21061 SHARC processor. The change from the CISC 68K to the 21K "super" Harvard architecture, with what are effectively superscalar RISC-like instructions, raises a whole new series of optimization issues.

    Part Inventory
    Go
    powered by:
     

     
    You must log on before posting a comment.

    Are you a new visitor? Register Here
      There are no comments to display. Be the first one!