Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Technology Report]
Programmable Media Processors Deliver Flexible Solution
Resource-rich multimedia engines handle MPEG-2, MPEG-4, and other video-processing tasks for entertainment and handheld systems.

Dave Bursky  |   ED Online ID #3399  |   May 12, 2003


Low-cost dedicated and programmable video engines deliver the performance and flexibility needed to handle the plethora of standards that decode or encode the expanding video capabilities within consumer and business applications. Most high-end general-purpose CPUs, such as the Intel P4 or the Sun UltraSparc, can handle the media encode or decode. But they're too expensive and power-hungry for consumer and portable system applications, like set-top boxes, DVD/personal video recorders, Internet appliances, cell phones, and more. Low cost is paramount when designing many of these applications.

That alone rules out CPUs initially targeted at desktop computers, because the total semiconductor bill of materials must typically be kept to less than about $70. Often, joining an inexpensive CPU with either an application-specific IC or a programmable high-speed DSP to handle the user interface will deliver the performance needed to handle one or more video streams. Such a combination, along with the necessary memories and other support circuits, can keep the cost within the desired range.

Various video algorithms require differing amounts of computational throughput. Most have an inverse relationship between the bit rate and the amount of horsepower required to process the data. Typically, the lower the bit rate, the more processing power it takes to either encode or decode the image and maintain the desired image quality.

Thus, a typical MPEG-2 decode algorithm might demand about 300 MIPS from a DSP engine, while an MPEG4 decode function takes slightly more as the algorithms are a bit more complex. Similarly, processing requirements for algorithms such as Microsoft's Windows Media Video could hit about 500 MIPS, because that format uses a more complex compression/decompression algorithm.

Programming these algorithms to execute on a DSP chip, such as the Blackfin or TigerSharc chips from Analog Devices or low-cost versions of the TMS320C5500 or C62/64/6700 families from Texas Instruments (TI), with a generic DSP solution is one way to tackle the problem. These chips pack an array of compute resources, including multiple ALUs and multipliers. They also shoehorn in system resources like multichannel DMA controllers and significant amounts of on-chip cache memory. (For more information about Analog Devices' Blackfin family, see "Cost-Savvy DSP Chip Trio Keeps Performance High," Electronic Design, March 31, 2003, page 42.)

TI offers a wide variety of VLIW-based (very-long-instruction-word) DSP chips that range from under $10 to over $500 in 1000-unit lots. Low cost coupled with high throughput are key design parameters, so let's look at the types of resources available on the cheapest members—the fixed-point TMS320C6204 and 620, as well as the fixed- and floating-point TMS320C6211B and 6711B.

Using its VelociTI VLIW-based C6000 series CPU to control twin datapaths, the TMS320C6211/12 or 6711/12 can execute up to eight 32-bit instructions per cycle (Fig. 1). The 6211/12 and 6411 families handle fixed-point calculations, while the 6711/12 supports both fixed- or floating-point computations.

Each datapath includes two ALUs (one floating point and one fixed point) as well as other blocks that perform data addressing and other functions. When clocked at its top speed of 200 MHz, the 6711/12 processors can deliver a peak throughput of 1200 MFLOPS, while the 6211 offers a peak throughput of 1333 MIPS when clocked at 167 MHz. The latest addition to the family, the TMS320-C6411, ups the clock rate to 300 MHz and can deliver a throughput of 2400 MIPS. But it sells for more than double that of the 6211B. To achieve the higher throughput, the 6411 incorporates an extension to the VelociTI VLIW architecture that allows each ALU to support single 32-bit, dual 16-bit, or quad 8-bit arithmetic operations during each clock cycle. That dramatically increases the number of computations that can be done, especially on pixel-type data sets.

Taking aim at the high end of the DSP space, TI has also just released 720-MHz versions of its TMS320C6416, 15, and 14 DSP chips—a 20% speed improvement versus the company's best previous devices. The most highly integrated chip, the 6416, includes on-chip Viterbi and Turbo-code coprocessors to improve its ability to handle more channels in 3G wireless basestations or implement adaptive antenna array processing while providing eight time slots for GSM/GPRS/EDGE modems. All three DSP chips include 1 Mbyte of on-chip high-speed memory and high-speed peripherals that accelerate applications and the processing of real-time data.

VLIW architectures are also employed by two other more dedicated chips, both aimed at media processing. Now in its second generation, the BSP-15 media processor from Equator delivers an equivalent throughput of over 10 Goperations/s (Fig. 2). It achieves such throughput by combining a VLIW controller that packs four integer ALUs, two 64-bit single-instruction/multiple-data (SIMD) ALUs, and two 128-bit SIMD ALUs, along with dedicated coprocessor blocks that perform variable length encoding/decoding, video filtering, audio processing, and so forth.


<-- prev. page     [1] 2 3     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?


  • Network-On-Chip Tools Arrive for The Masses
  • Tackling System Design Challenges Through Early Verification
  • ESL Tools Take Center Stage As Designers Move Up
  • Parasitic Extraction Tool Targets Next-Generation Custom ICs
  • Synopsys Jumps Into ESL-Synthesis Pool
  • Verify Control Systems Before Committing To Hardware
  • You're Using How Many FPGAs?
  • Tool Up For The FPGA Blitz
    1) Build A Smart Battery Charger Using A Single-Transistor Circuit
    (181 views today)
    2) Hot Hands For Some Cool Rock: Motion Sensing Meets Audio Engineering
    (171 views today)
    3) GPS-Derived Grandmaster Clock Delivers Ultra-Precise Time And Frequency Sync
    (91 views today)
    4) What's All This Transimpedance Amplifier Stuff, Anyhow? (Part 1)
    (79 views today)
    5) Downconverting Mixers Lower Power Consumption While Improving Performance
    (73 views today)
    ALL TOP 20



    POST YOUR COMMENTS HERE
    Name:

    Email:
    Your Comments:

    Enter the text from the image below


    Please refresh the page if you have trouble reading this text.

    Search Electronic Design
         
      
     
    Web Seminar
    Sponsored By:
    Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
    Speakers: 
    Date: 07/01/08
    Register: 

    Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
    Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources