Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Technology Report]
Harness Today's DSPs: Propel Tomorrow's Designs
Resource-rich configurable processors perform billions of operations/s to handle the most demanding DSP algorithms.

Dave Bursky  |   ED Online ID #6979  |   December 18, 2003


Designers always crave greater computational throughput in their DSP application s. More throughput equals richer DSP functionality, whether it's performing more exacting calculations to deliver better filtering or imaging or handling multiple tasks to eliminate additional components. To eke out as much performance as possible, new DSP architectures are coming equipped with configurable arrays of compute engines and blocks of memory.

In contrast, many existing commodity DSP chips employ some form of Harvard or very-long-instruction-word (VLIW) architectures, resulting in a general-purpose fixed-architecture solution. Most of their performance comes from raw clock speed and the use of multiple multiplier-accumulators, which usually operate as a single-instruction/multiple-data (SIMD) compute array. But today's chips, with clock speeds of 600 MHz and higher, have reached a performance plateau of several gigaoperations per second (GOPS).

Fixed-architecture array processors, such as the vector-accelerator on the AltiVec PowerPC processor from Motorola and Intel's digital media processor (the MPX5800 and 5400), are another option. These devices are well suited to deal with large arrays of sequential data. Yet their fixed architectures don't offer the flexibility of configurable processor arrays, which allow software to control the processor interconnections and data flow. FPGAs, at the other end of the spectrum, offer total flexibility but at the expense of individual-element performance. (For more, DRILL DEEPER 6980 at www.elecdesign.com to see the "Configure Your Own Custom DSP Solution" sidebar.)

Over the last few years, the ability to create large arrays of processing elements via software has improved dramatically as designers employ more advanced process technologies. By tailoring the array resources to the algorithm via software, you'll see an aggregate compute throughput at least an order of magnitude higher than current Harvard and VLIW architectures—and at the same or even lower clock speeds. This will provide room for perhaps yet another order-of-magnitude increase in performance as clock rates rise.

Applications range widely for these configurable DSP chips: software-defined radios, flexible cellular basestations, antenna beam-forming control, and high-throughput image- and voice-processing systems. To facilitate their development and implementation, though, software tools must be easy to use and robust enough to handle complex algorithms.

In this emerging area, many companies, mostly on the smaller side, offer a broad choice of architectural approaches that deliver throughputs of 20 GOPS and higher without pushing silicon clock speeds beyond 500 MHz. Some of the offerings come as intellectual property (IP), which designers can incorporate into an ASIC solution. Also, a few companies have "standard" silicon products that designers can use as an OEM product or in a prototype for a "proof-of-concept" implementation before embedding the IP into a custom solution.

FLEXIBLE ARRAY PROCESSING
PACT XPP Technologies is one of the first to come up with a configurable array solution. The company dubbed its architecture the "extreme processing platform." Though PACT expects to license the technology to companies that want to embed it in a custom chip, its XPP64-A silicon architecture was developed to show the capabilities. Included on-chip are 64 ALUs/processing array elements (ALU-PAEs), 16 RAM-PAEs, four I/O interface ports, a configuration manager with a 1.4-Mbit cache memory that can hold several configurations, and built-in debugging support via a JTAG IEEE 1149 interface (Fig. 1).

The configuration manager is a specialized microcontroller that supervises the array's configuration. Its operating system manages the array resources and allows several configurations to be loaded onto the array. Configuration sequencing is performed on shared resources without deadlocks.

In all, about 51 million transistors are interconnected using six levels of copper. The combined throughput of the ALU-PAEs hits 4096 million multiply-accumulates (MACs) when clocked at only 64 MHz, leaving plenty of room for improved performance as the clock frequency increases.

The application software is defined by dynamic reconfiguration of operations and connections within the processor array. This eliminates the overhead associated with program sequencers and decoding logic. Each ALU-PAE block contains an eight-by-eight array of compute elements. Every element is made up of three sub-blocks.

A two-input, two-output ALU performs the main computations. A Back register provides routing path control in the vertical direction and a simpler ALU (the ALU portion of the Back register can be used for addition, barrel shifts, and normalization tasks). Finally, a Forward register also provides routing paths in the vertical direction. A specialized ALU in this register offers data-stream control, such as multiplexing and swapping.

The RAM-PAEs are similar to the ALU-PAEs, except that the main ALU is replaced by a dual-port 512-word by 24-bit storage array that can also double as a FIFO memory. A packet-based communications scheme is used between the ALU-PAEs and the RAM-PAEs. The RAM generates a data packet after an address packet was received at the read input. Writing to the RAM requires two packets, one with the address and the other containing the data word to be written.

In between the rows of ALU-PAEs are the data channels. These constitute a communications network that allows point-to-point and point-to-multipoint connections from outputs to inputs of ALU-PAEs, RAM-PAEs, and the I/O ports.

In one particular setup, PACT is partnering with QuickLogic Corp. to combine its configurable array with QuickLogic's QuickMIPS highly integrated system-on-a-chip platform. The result, an XPP prototyping platform, targets network infrastructure and digital consumer applications. Combining QuickMIPS and XPP delivers a high-performance and flexible system platform that can adapt to changing communication protocols and application demands.

Many other architectures developed by the roster of companies in this arena use a variation of the same basic architectural theme: All of the chips basically contain an array of compute engines, some data memory, and a control processor. The "magic" is in the way the blocks can be interconnected and controlled.


<-- prev. page     [1] 2 3 4     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?


  • Engineers Rely On Internet For Product Info
  • Rochester Electronics Establishes New Design and Technology Group
  • Custom Sources Light Way To 22-nm IC Lithography
  • In EDA, A Year Of Mergers, Failed And Otherwise
  • Software Turns Scopes Into Vector RF Signal Analyzers
  • Couple’s $15 Million Gift Advances Rice Engineering Education
  • November 7, 2008
  • Startup Sets Sail For Speedier Spice Simulation
    1) Ten Top Design Skills For Tough Times
    (3118 views today)
    2) Build A Smart Battery Charger Using A Single-Transistor Circuit
    (318 views today)
    3) Energy Harvester Perpetually Powers WIreless Sensors
    (311 views today)
    4) Ultracapacitors Branch Out Into Wider Markets
    (296 views today)
    5) Technology Has Been Very Good To Obama, And He Plans To Reciprocate
    (176 views today)
    ALL TOP 20



    POST YOUR COMMENTS HERE
    Name:

    Email:
    Your Comments:

    Enter the text from the image below


    Please refresh the page if you have trouble reading this text.

    Search Electronic Design
         
      
     
    Web Seminar
    Sponsored By:
    Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
    Speakers: 
    Date: 07/01/08
    Register: 

    Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
    Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources