Imagine that you’re walking into a darkened conference
room. You switch on the lights and make
a few phone calls. All of a sudden, three of your
colleagues from across the globe appear at the
conference room table as if they were sitting there
in the dark all along. This represents the essence
of telepresence—an ultra-high-end video-conferencing
system.
These systems employ high-definition video on 50-in. or
larger flat-panel displays with audio designed to make all of the
participants’ voices seem like they’re coming straight from their
lips. And that’s not all. Typically, factors such as lighting and even
furniture are taken into account, with possibly half a conference
table in one room and the other half in the remote room.
A telepresence system like this could cost several hundred
thousand dollars, as is the case with the TelePresence 3000
from Cisco Systems. But viable alternatives exist at a variety of
price points from companies such as Hewlett-Packard, Life-
Size Communications, Polycom, Sony, Telanetix, and Vidyo.
Design engineers wanting to build telepresence and highdefinition
video-conferencing systems, from high-end setups
down to those that might run on PCs and video phones, should
begin by surveying the hardware needed to implement these
systems. The latest H.264 codecs are a good starting point.
H.264 CODECS
The driving technology behind telepresence and high-definition
video conferencing is the H.264 video standard, which
provides over twice the compression ratio of MPEG-2. Several
companies make H.264 codecs, including Fujitsu Microelectronics
America, W&W Communications, and Mobilygen.
Fujitsu’s MB86H51 compresses and decompresses full highdefinition
video (1920 dots by 1080 lines) in real time using the
H.264 format (Fig. 1). This is a single-chip implementation
for full HD H.264 high-profile version 4.0 video processing
that incorporates embedded memory. It also compresses and
decompresses audio in real time by utilizing formats such as
the MPEG-1 Audio Layer.
The MB86H51 uses a proprietary algorithm that automatically
applies less compression to areas in the image where
compression artifacts are most noticeable to human vision, such
as human faces or slow-moving objects, and increased compression
to other areas. The effect is to maximize image quality for
those critical zones. This feature also makes it possible to reduce
image size to between one-half and one-third the size of the
MPEG-2 format with an equivalent level of image quality.
“The advantage of our chip lies in our compression algorithm,”
says Davy Yoshida, director of Business Development
of Fujitsu Microelectronics America. “Comparing the compression
of MPEG-2 and H.264 is 2.5 times
the compression. So a 25-meg image will be 10
megs, at equal quality. But our chip can compress,
with very little depreciation, to a smaller
size, like 25 megs to 5 megs, and still show a
very good quality picture.”
The chip also contains two blocks of
256-Mbit fast-cycle random access memory
(FCRAM) embedded on-chip. The chip measures only 15 mm squared and consumes just
750 mW. The MB86H51 comes in a 650-pin
FBGA package and began mass production
in July of last year, priced at $295 in sample
quantities. Fujitsu plans to develop a much
more cost-effective version of this codec, and
it may launch in the latter half of this year.
W&W Communications’ WW10K
H.264 HD codec chip set consists of the
WW10000BA single-chip encoder and the
WW10001BA single-chip decoder (Fig. 2).
The low encode-decode tandem delay as well as the ability to
encode and decode 1080p and 720p video at low bit rates suit the
WW10K chip set for high-definition video-conferencing and
telepresence applications.
The WW10K runs at 110 MHz in single-chip implementations
of the encoder and decoder. The WW10000BA encoder compresses
1080p or 720p HD video at bit rates that are two times lower
than MPEG-2 HD encoders, with 15% better peak signal-to-noise
ratio (PSNR). The WW10001BA decompresses the encoder’s bit
stream into quality 1080i/p or 720p HD video.
The chip set has an encode-decode tandem delay of less than 35
ms or about 1 frame at 30 frames/s, delivering performance very
close to the H.264 Joint Model. It can handle up to four video
inputs simultaneously at different bit rates and resolutions, up to
1920 by 1088. This makes it possible to design systems that dedicate
one camera per participant or group of participants and one
display per participant or group of participants, delivering more
immersive and lifelike video communications experiences.
Continue on Page 2