• Channels
Part Inventory
Go
 
powered by:

 
  • Quick Poll
What Social Networking site do you use the most?



VOTE VIEW RESULTS
Previous Polls

Premium Content

New Signal Chain Technical Papers from Texas Instruments:

 

 

 

Programmable Media Processors Deliver Flexible Solution

Resource-rich multimedia engines handle MPEG-2, MPEG-4, and other video-processing tasks for entertainment and handheld systems.


Dave Bursky

May 12, 2003

Print
Reprints Comment Subscribe

Low-cost dedicated and programmable video engines deliver the performance and flexibility needed to handle the plethora of standards that decode or encode the expanding video capabilities within consumer and business applications. Most high-end general-purpose CPUs, such as the Intel P4 or the Sun UltraSparc, can handle the media encode or decode. But they're too expensive and power-hungry for consumer and portable system applications, like set-top boxes, DVD/personal video recorders, Internet appliances, cell phones, and more. Low cost is paramount when designing many of these applications.

That alone rules out CPUs initially targeted at desktop computers, because the total semiconductor bill of materials must typically be kept to less than about $70. Often, joining an inexpensive CPU with either an application-specific IC or a programmable high-speed DSP to handle the user interface will deliver the performance needed to handle one or more video streams. Such a combination, along with the necessary memories and other support circuits, can keep the cost within the desired range.

Various video algorithms require differing amounts of computational throughput. Most have an inverse relationship between the bit rate and the amount of horsepower required to process the data. Typically, the lower the bit rate, the more processing power it takes to either encode or decode the image and maintain the desired image quality.

Thus, a typical MPEG-2 decode algorithm might demand about 300 MIPS from a DSP engine, while an MPEG4 decode function takes slightly more as the algorithms are a bit more complex. Similarly, processing requirements for algorithms such as Microsoft's Windows Media Video could hit about 500 MIPS, because that format uses a more complex compression/decompression algorithm.

Programming these algorithms to execute on a DSP chip, such as the Blackfin or TigerSharc chips from Analog Devices or low-cost versions of the TMS320C5500 or C62/64/6700 families from Texas Instruments (TI), with a generic DSP solution is one way to tackle the problem. These chips pack an array of compute resources, including multiple ALUs and multipliers. They also shoehorn in system resources like multichannel DMA controllers and significant amounts of on-chip cache memory. (For more information about Analog Devices' Blackfin family, see "Cost-Savvy DSP Chip Trio Keeps Performance High," Electronic Design, March 31, 2003, page 42.)

TI offers a wide variety of VLIW-based (very-long-instruction-word) DSP chips that range from under $10 to over $500 in 1000-unit lots. Low cost coupled with high throughput are key design parameters, so let's look at the types of resources available on the cheapest members—the fixed-point TMS320C6204 and 620, as well as the fixed- and floating-point TMS320C6211B and 6711B.

Using its VelociTI VLIW-based C6000 series CPU to control twin datapaths, the TMS320C6211/12 or 6711/12 can execute up to eight 32-bit instructions per cycle (Fig. 1). The 6211/12 and 6411 families handle fixed-point calculations, while the 6711/12 supports both fixed- or floating-point computations.

Each datapath includes two ALUs (one floating point and one fixed point) as well as other blocks that perform data addressing and other functions. When clocked at its top speed of 200 MHz, the 6711/12 processors can deliver a peak throughput of 1200 MFLOPS, while the 6211 offers a peak throughput of 1333 MIPS when clocked at 167 MHz. The latest addition to the family, the TMS320-C6411, ups the clock rate to 300 MHz and can deliver a throughput of 2400 MIPS. But it sells for more than double that of the 6211B. To achieve the higher throughput, the 6411 incorporates an extension to the VelociTI VLIW architecture that allows each ALU to support single 32-bit, dual 16-bit, or quad 8-bit arithmetic operations during each clock cycle. That dramatically increases the number of computations that can be done, especially on pixel-type data sets.

Taking aim at the high end of the DSP space, TI has also just released 720-MHz versions of its TMS320C6416, 15, and 14 DSP chips—a 20% speed improvement versus the company's best previous devices. The most highly integrated chip, the 6416, includes on-chip Viterbi and Turbo-code coprocessors to improve its ability to handle more channels in 3G wireless basestations or implement adaptive antenna array processing while providing eight time slots for GSM/GPRS/EDGE modems. All three DSP chips include 1 Mbyte of on-chip high-speed memory and high-speed peripherals that accelerate applications and the processing of real-time data.

VLIW architectures are also employed by two other more dedicated chips, both aimed at media processing. Now in its second generation, the BSP-15 media processor from Equator delivers an equivalent throughput of over 10 Goperations/s (Fig. 2). It achieves such throughput by combining a VLIW controller that packs four integer ALUs, two 64-bit single-instruction/multiple-data (SIMD) ALUs, and two 128-bit SIMD ALUs, along with dedicated coprocessor blocks that perform variable length encoding/decoding, video filtering, audio processing, and so forth.

Average (0 Ratings):

Subscribe
Subscribe to Electronic Design and start receiving more articles like this one
Filed Under:

Check for price and availability on Source ESB:

Go
powered by  
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here
Acceptable Use Policy

Sponsored Links