Video-Transcoding Prototype Implementation Transcoding is also appropriate for sending DVD-originated data over an IP network, such as in a company training application, video-on-demand application, or video-broadcasting application. In this case, MPEG-2 would be the source video format and VC1 would most likely be used as the target format. In this section, we will describe the implementation of a prototype of such a system using two TI TMS320C6455 DSPs.
Technically, video transcoding is required to solve many issues, such as format conversion, bit-rate reduction, and temporal/spatial resolution reduction. Correspondingly, different intelligent video-transcoding schemes are developed to fit different issues. The principle is to reuse information contained in the original incoming video stream as much as possible for complexity simplification.
For instance, motion vector (MV) mapping, discrete-cosine-transform (DCT) domain conversion, and residual re-estimation are popular techniques for video transcoding to reduce computational complexity significantly.
In addition, a simple and extendable architecture of transcoding is also desired. Because different video-transcoding solutions require tailoring algorithms and architectures in various ways and there's no single standardized video-transcoding scheme, the programmability of a DSP like the C6455 DSP fits this domain.
In the remainder of this section, we will propose a general video-transcoding architecture and prototype that fits all kinds of transcoding schemes. To fit different scenarios in video transcoding targeting, we pick the simplest transcoding scheme that fully re-encodes the decoded video stream subject to new constraints.
This initial video transcoding implementation does not reuse the information contained in the original incoming video stream and demonstrates the performance capability to handle the full complexity of decoding and re-encoding. However, this video transcoding architecture and software infrastructure can be extended to leverage intelligent transcoding schemes MV mapping, DCT domain conversion, etc.) to increase channel density and exploit potential quality optimizations. Many conventional and novel transcoding schemes can be implemented using this architecture based on the flexible hardware/software framework.
DSPs Are Crucial High DSP computational performance, like that provided by the C6455 DSP, is a prerequisite for video encoding and decoding. Other features also are critical for video-infrastructure applications, and they can be broken down into four primary areas:
Multiple powerful I/O options : Systems designers address problems from different perspectives, which means a DSP for video-infrastructure applications should provide I/O options for board-level connectivity. As previously mentioned, an sRIO port is built in for interdevice communications. A high-throughput message-passing scheme used by sRIO achieves 95% utilization of the available data bandwidth. Other I/O options are a 1-Gbit/s Ethernet media access controller (EMAC), a 32-bit double-data-rate (DDR2-500) memory controller, and a 66-MHz Peripheral Component Interconnect (PCI) bus.
Efficient on-chip data movement: In video infrastructure applications, DSPs act as slave devices to the host processor. Ensuring high-throughput, low-latency, concurrent data transfers between masters and slaves is therefore important. The architectural consequence of these requirements is that peripherals, internal memory, and the DSP core are interconnected through an efficient switched central resource (SRC), like that in the C6455 DSP.
Dataflow streamlining is also important. Improvements are realized by employing 256-bit wide memory buses and an internal DMA (IDMA). The IDMA performs background data movement between the two levels of internal memory and to/from the peripheral bus.
Large on-chip memory: Compared to off-chip SDRAM, on-chip SRAM is much faster and its size is much smaller due to its implementation cost. For a typical video application, the on-chip memory mainly serves two purposes. First, it stores code and data that are accessed frequently, such as a variable-length-code (VLC) table, and so on. Second, it swaps in/out temporal data before/after processing. Usually, the more on-chip memory available, the better the application performance. Up to 2 Mbytes of on-chip SRAM are deployed in a C6455 DSP, which helps boost video-application performance and makes it possible to handle multiple channels.
Code compatibility: Backward code compatibility is important because a great deal of code was developed for video applications long before transcoding for video-infrastructure applications became commonplace. Compared to instruction set change, the DSP core architecture is the best place to improve performance for critical signal-processing operations.
For instance, the C6455 has two architectural innovations. The first is the introduction of a loop buffer, which potentially improves the software pipeline efficiency of small loop code. The other is the use of 16-bit versions of native 32-bit instructions, which significantly reduces code size and, therefore, lowers the program cache miss rates.