[Engineering Essentials]
High-Def Video Brings Telepresence Into Focus
H.264 codecs implemented in ICs and DSPs turn high-end video conferencing over IP networks into reality.
Joseph Desposito
ED Online ID #18244
February 28, 2008
Copyright © 2006 Penton Media, Inc., All rights reserved. Printing of this document is for personal use only.
Reprints
Imagine that you’re walking into a darkened conference
room. You switch on the lights and make
a few phone calls. All of a sudden, three of your
colleagues from across the globe appear at the
conference room table as if they were sitting there
in the dark all along. This represents the essence
of telepresence—an ultra-high-end video-conferencing
system.
These systems employ high-definition video on 50-in. or
larger flat-panel displays with audio designed to make all of the
participants’ voices seem like they’re coming straight from their
lips. And that’s not all. Typically, factors such as lighting and even
furniture are taken into account, with possibly half a conference
table in one room and the other half in the remote room.
A telepresence system like this could cost several hundred
thousand dollars, as is the case with the TelePresence 3000
from Cisco Systems. But viable alternatives exist at a variety of
price points from companies such as Hewlett-Packard, Life-
Size Communications, Polycom, Sony, Telanetix, and Vidyo.
Design engineers wanting to build telepresence and highdefinition
video-conferencing systems, from high-end setups
down to those that might run on PCs and video phones, should
begin by surveying the hardware needed to implement these
systems. The latest H.264 codecs are a good starting point.
H.264 CODECS
The driving technology behind telepresence and high-definition
video conferencing is the H.264 video standard, which
provides over twice the compression ratio of MPEG-2. Several
companies make H.264 codecs, including Fujitsu Microelectronics
America, W&W Communications, and Mobilygen.
Fujitsu’s MB86H51 compresses and decompresses full highdefinition
video (1920 dots by 1080 lines) in real time using the
H.264 format (Fig. 1). This is a single-chip implementation
for full HD H.264 high-profile version 4.0 video processing
that incorporates embedded memory. It also compresses and
decompresses audio in real time by utilizing formats such as
the MPEG-1 Audio Layer.
The MB86H51 uses a proprietary algorithm that automatically
applies less compression to areas in the image where
compression artifacts are most noticeable to human vision, such
as human faces or slow-moving objects, and increased compression
to other areas. The effect is to maximize image quality for
those critical zones. This feature also makes it possible to reduce
image size to between one-half and one-third the size of the
MPEG-2 format with an equivalent level of image quality.
“The advantage of our chip lies in our compression algorithm,”
says Davy Yoshida, director of Business Development
of Fujitsu Microelectronics America. “Comparing the compression
of MPEG-2 and H.264 is 2.5 times
the compression. So a 25-meg image will be 10
megs, at equal quality. But our chip can compress,
with very little depreciation, to a smaller
size, like 25 megs to 5 megs, and still show a
very good quality picture.”
The chip also contains two blocks of
256-Mbit fast-cycle random access memory
(FCRAM) embedded on-chip. The chip measures only 15 mm squared and consumes just
750 mW. The MB86H51 comes in a 650-pin
FBGA package and began mass production
in July of last year, priced at $295 in sample
quantities. Fujitsu plans to develop a much
more cost-effective version of this codec, and
it may launch in the latter half of this year.
W&W Communications’ WW10K
H.264 HD codec chip set consists of the
WW10000BA single-chip encoder and the
WW10001BA single-chip decoder (Fig. 2).
The low encode-decode tandem delay as well as the ability to
encode and decode 1080p and 720p video at low bit rates suit the
WW10K chip set for high-definition video-conferencing and
telepresence applications.
The WW10K runs at 110 MHz in single-chip implementations
of the encoder and decoder. The WW10000BA encoder compresses
1080p or 720p HD video at bit rates that are two times lower
than MPEG-2 HD encoders, with 15% better peak signal-to-noise
ratio (PSNR). The WW10001BA decompresses the encoder’s bit
stream into quality 1080i/p or 720p HD video.
The chip set has an encode-decode tandem delay of less than 35
ms or about 1 frame at 30 frames/s, delivering performance very
close to the H.264 Joint Model. It can handle up to four video
inputs simultaneously at different bit rates and resolutions, up to
1920 by 1088. This makes it possible to design systems that dedicate
one camera per participant or group of participants and one
display per participant or group of participants, delivering more
immersive and lifelike video communications experiences.
Continue on Page 2
The WW10000BA encoder also integrates an advanced context-
adaptive noise-reduction filter. This not only cuts noise in
the source video, but also improves the encoder’s compression
efficiency significantly, depending on the video content.
The WW10K H.264 HD codec chip set has been in mass
production since April 2007. A development kit is available with
HDMI, component video, Y/C and composite video inputs and
outputs, and PCI and 10/100BaseT Ethernet interfaces.
Mobilygen’s H.264 HD codec system-on-a-chip (SoC), the
MG3500, is a member of the company’s en-ViE platform (Fig. 3).
It can encode HD content, including 720p60, 1080p24, 1080p30,
or 1080i60 material. It additionally may be used to encode two
720p30 sources or encode and decode 720p30
content simultaneously.
The MG3500 supports H.264’s
Baseline, Main, and High Profiles up
to Level 4.1. Macro-Block Adaptive
Field/Frame (MBAFF) encoding in
the Main and High Profiles allows
the highest quality per bit of interlaced
material. It also supports IDE and CompactFlash
and extends the Ethernet MAC
capability to support Gigabit Ethernet.
Last June, Mobilygen announced its en-ViE platform
of codec SoCs for the creation,
playback, and distribution of HD H.264
video. The en-ViE platform offers High Profile 1080i H.264 encoding at full 1920-by-1080 resolution with
both Context-Adaptive Binary Arithmetic Coding (CABAC) and
MBAFF to provide the highest-quality video at any given bit rate.
The MG3500’s ability to perform full-duplex 720p30 encoding
and decoding makes it possible to implement single-chip
video-conferencing systems. But two chips are needed for 1080i60
resolution—one for encode and one for decode. The real-time
transcoding of legacy HD MPEG-2 streams into H.264 helps
minimize video storage requirements and enables reliable video
streaming over wireless networks.
The SoCs integrate an ARM9 CPU dedicated to user applications.
A programmable multimedia engine supports all leading
audio formats, including AAC, MP3, G.7xx, and Dolby Digital.
Network connectivity is provided via integrated Gigabit Ethernet
and high-speed USB 2.0 OTG.
AES encryption and digital signature hardware provide secure
networking and storage. Most popular video storage devices are
supported, including USB, SD, MMC, CompactFlash, CE-ATA,
and IDE. In addition, the en-ViE SoCs support digital image
stabilization for use in IP cameras. The MG3500 HD codec costs
$30 in high volume.
Since latency can be an issue in telepresence systems, Mobilygen
has come up with a unique way of dealing with those requirements.
“Typically, the requests we get are sub-100-ms
response,” says Brian Johnson, vice president of marketing
at Mobilygen.
“And we have the ability to encode at a slice
level, which is a fraction of a frame. By making
a slice smaller and smaller, you can minimize
latency. You don’t have to wait until you get a
whole frame before you start encoding,” says
Johnson. “There are a number of things like that
that you can do to meet the latency requirements. We
can get encode, decode, and buffering in between, in
under 50 ms.”
The en-ViE platform features a complete
Linux development environment, substantially expanding upon Mobilygen’s existing
H.264 software. This includes productionready
application programming interfaces
(APIs), drivers, optimized codec firmware,
example applications, and complete reference
designs to accelerate time-to-market.
DOING H.264 WITH DSPS
One of the premier telepresence systems
employs DSPs from Analog Devices to
perform the necessary encoding and
decoding of high-definition video, among
other things. Design engineers from Cisco
Systems used ADI’s Blackfin DSP as a
main ingredient of its TelePresence 1000
and 3000 systems.
Continue on Page 3
Cisco’s codec is based on the H.264
video codec standard, and it provides
best-in-class latency and up to 1080p30
video resolution occurring in real time.
By minimizing the HD video encode/
decode latency, Cisco’s solution gives more
“latency budget” back to the network. This
enables swift deployment, with a minimum
of hardware and software upgrades for the
network to handle the new application.
With dual symmetric, 600-MHz, highperformance
Blackfin cores, Blackfin
ADSP-BF561 processors were the ideal
choice for Cisco’s exceedingly challenging
and complex video application. Cisco’s
video-codec functionality is distributed
across a multiprocessor farm of Blackfin
ADSP-BF561 processors, delivering more
than 0.5 tera-instructions/s of processing
muscle. Thus, the video subsystem performs
at truly best-in-class levels, enabling
practical solution deployment.
On another DSP front, Spirit DSP
recently announced the HD-enabled version
of its flagship product, the TeamSpirit
Voice&Video Engine. Available for both
PC and mobile platforms, the engine now
has new features to empower high-definition
video-conferencing terminals and
mobile devices with high-quality wideband
voice and seamless video.
The upgraded version of the engine features
wideband codecs (like Spirit’s proprietary
IP-MR, GSM AMR-WB) and Ultra-
Wideband codecs (like audio AAC LD) as
the most advanced audio technology for
high-end video telephony. In addition to
video engine and software video codecs,
the TeamSpirit Engine easily integrates
with external hardware-accelerated video
codecs. It also increases the video quality
and robustness of the entire solution.
“Today, HD is the fast-moving trend,
leaving grainy and blurry video conferencing
behind. It allows users to see everyone in
the room, and the resolution is high enough
to detect even an eye roll. Full-duplex audio
allows people to talk at the same time without
muddying the sound,” says Slava Borilin,
vice president of Products & Marketing
at Spirit.
HIGH-DEF VIDEO SENSOR
OmniVision Technologies calls its OV9710
CameraChip the first true HD video sensor
for the mobile handset and notebook
PC markets. The OV9710 is a 1-Mpixel
CMOS sensor built with OmniVision’s
proprietary OmniPixel3 architecture,
which uses a 3- by 3-µm pixel, for optimal
low-light sensitivity and high-quality HD
video performance at 30 frames/s. It meets
all camera phone (1280 by 720) and PC
multimedia (1280 by 800) market requirements
in terms of performance, quality,
reliability, and power consumption.
This is the kind of sensor that can significantly
raise the bar for PC-type video-conferencing
applications. “With sensitivity ratings
of over 2300 mV per lux-second, we are
providing exceptional image quality at price
points that are attractive to high-volume
markets,” says Bruce Weyer, OmniVision’s
vice president of marketing.
Designed for use as a 0.25-in. HD video
camera, the OV9710 provides full-frame,
sub-sampled, windowed 8/10-bit images
in raw RGB format via the digital video
port. The sensor delivers full-frame HD
video at 30 frames/s in WXGA (1280
by 800) or 60 frames/s in sub-sampled
WXGA (640 by 400) with complete user
control over image quality, formatting, and
output data transfer.
The OV9710 incorporates image-processing
functions, including exposure control,
gain control, white balance, lens correction,
and defect pixel canceling. These
functions are also programmable through
the Serial Camera Control Bus interface.
Available in a variety of lead-free packaging
options, the OV9710 operates from
–30°C to 70°C. It’s currently available in
sample quantities. Volume shipping is
expected to begin in the second quarter of
this year.
|