Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Product Innovation]
Audio/Video Processor Fits Applications From Palmtops To Home Theaters
A scalable, modular architecture delivers the performance and flexibility needed to satisfy a wide range of media applications.

Dave Bursky  |   ED Online ID #1252  |   March 6, 2000


More system resources currently concentrate on the man-machine interface that's implemented by graphics, audio, and video signal processing. At the same time, these signal processors demand additional computing performance. A number of media processors have been released for computer or set-top-box solutions. But they've typically been designed so that those systems will meet a specific performance target. A media explosion is taking place, though, with many new classes of systems—from small-format handheld platforms to high-end home-theater systems requiring audio and video capabilities.

The solution is a multiprocessor that makes it easy to develop applications that can run on a wide range of platforms. On the low end, this processor must be able to deliver the audio, video, and graphics at a cost appropriate for personal digital assistants (PDAs). Yet it has to provide outstanding performance for home theaters at the high end.

Tackling this range of requirements is the DVine architecture developed by Silicon Magic. It provides a modular, scalable processor that leverages embedded DRAM to deliver performance levels many times that of RISC or CISC architectures. The company will offer it as both an off-the-shelf chip for evaluation and as a core that can be licensed for companies to craft custom solutions. The first implementation is designed to be fabricated on a 0.18-µm, five-layer CMOS process that allows input clocks of 100 MHz. It includes large amounts of embedded DRAM to be integrated with the logic.

Standing for DRAM vector engine, the DVine architecture is based on a symmetrical multiprocessor approach with single-instruction/multiple-data (SIMD) extensions. That combination allows it to deliver compute throughputs higher than that of CISC and/or RISC processors or even specialized media engines. DVine also uses well-understood programming techniques employing C-language constructs. Anyone familiar with C programming can craft application software.

The architecture consists of two main modular sections. A compute module contains both scalar and vector processors. The memory-interface unit (MIU) ties banks of embedded DRAM into 128-bit-wide buses that connect everything (Fig. 1). Aside from that pair of blocks, DVine includes an external bus-interface unit (XBIU) that connects the chip to the host system. A data-flow controller (DFC) coordinates the movement of data between the MIUs and the compute modules.

Depending on the amount of horsepower needed, designers can combine multiple compute and memory modules on the same chip. To perform HDTV decoding, which requires MPEG-2 decoding at MP@HL resolution, a designer can use 11 compute modules and eight memory modules. A decoder for a DVD player performs MPEG-2 MP@ML decoding. It requires two compute modules and two memory modules, while a video phone that employs an MPEG-4 video codec algorithm needs just two compute modules and one memory module. Other systems, like an MP3 recorder/player that uses an MPEG-1 layer III codec or a digital camera that performs JPEG image coding, require just one of each.

With the combination of scalar and vector processors in the compute module, a single module can perform both setup and control of the vector computations. The vector engine rips through the computations needed for the audio, graphics, and video algorithms. The scalar engine is a RISC processor with a MIPS-like architecture that executes a single-issue, in-order instruction stream. With an input clock speed of 100 MHz, the processor delivers a throughput of 200 MIPS.

Inside the processor is a five-stage, pipelined, 32-bit data path and separate paths for instruction and data flows. The designers even added special registers to aid in processor-to-processor communications. In the compute module, a block of fast static RAM serves as an instruction cache. A register file that's shared between the scalar and vector processors allows the two units to exchange data easily.

To tie into the wide internal buses, the compute block includes a data-communications-channel controller (DCC) and a direct-memory-access (DMA) controller. The DCC is implemented with a multi-channel crossbar bus that can connect any computing module to any memory-interface unit or to the external bus-interface unit. With the wide buses, the DCC delivers an overall memory bandwidth of 3.2 Gbytes/s.

The companion vector engine also is a single-issue, in-order execution processor. It employs a 16-byte vector width and lets variable vector lengths achieve a raw throughput of 6.4 GOPS. In the vector unit, 16 identical 16-bit data paths speed the computations. That unit can execute zero-overhead loop iterations and perform horizontal data swapping to rapidly manipulate data structures.

The 16-channel SIMD vector processor is based on a dual-execution unit architecture, rather than the more common multiplier-accumulator (MAC) approach. It offers a very efficient structure for performing motion estimation, which is a key element in image-processing algorithms. At the same time, it doesn't preclude the implementation of MAC functions.

Each compute element in the vector engine processes one data sample in parallel with the other elements. The processor can perform up to 16 operations in parallel. It can thereby deliver the high throughput necessary to handle applications such as motion estimation, motion compression, quantization, filtering, scaling, and discrete cosine transforms.


<-- prev. page     [1] 2     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?



POST YOUR COMMENTS HERE
Name:

Email:
Your Comments:

Enter the text from the image below


Please refresh the page if you have trouble reading this text.

Search Electronic Design
     
  
 
Web Seminar
Sponsored By:
Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
Speakers: 
Date: 07/01/08
Register: 

Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources