Beyond native support for 8-bit data,
which is the word size common to many
pixel-processing algorithms, the Blackfin
architecture includes instructions
specifically defined to enhance performance
in video-processing applications.
For instance, the “SUM ABSOLUTE
DIFFERENCE” instruction supports
motion-estimation algorithms used in
video-compression algorithms such as
MPEG2, MPEG4, and JPEG.
The architecture handles multi-length
instruction encoding. Very frequently used
control-type instructions are encoded as
compact 16-bit words, with more mathematically
intensive signal-processing
instructions encoded as 32-bit values. The
processor will intermix and link 16-bit control
instructions with 32-bit signal-processing
instructions into 64-bit groups to
maximize memory packing. When caching
and fetching instructions, the core automatically
fully packs the length of the bus,
since it doesn’t have alignment constraints.
All Blackfin processors, such as the
ADSP-BF523, contain independent DMA
controllers that support automated data
transfers with minimal overhead from the
processor core (Fig. 3). DMA transfers can
occur between the internal memories and
any of the many DMA-capable peripherals.
Transfers can also occur between the
peripherals and external devices connected
to the external memory interfaces, including
the SDRAM controller and the asynchronous
memory controller.
Memory architecture includes both L1
and L2 memory blocks. L1 memory is connected
directly to the processor core, runs
at full system clock speed, and offers maximum
system performance for time-critical
algorithm segments. Also, L1 memory can
be configured as SRAM, cache, or a combination
of both.
By supporting both SRAM and cache
programming models, system designers
can allocate critical real-time signal-processing
data sets that require high bandwidth
and low latency into SRAM, while
storing “soft” real-time control and operating-
system (OS) tasks in the cache memory.
L2 memory is a larger, bulk memory
storage block that offers slightly reduced
performance, but is still faster than offchip
memory.
Every Blackfin processor employs
multiple power-saving techniques based
on a gated-clock core design that selectively
powers down functional units on
an instruction-by-instruction basis. These
processors also support multiple powerdown
modes for periods where little or no
CPU activity is required.
In this self-contained dynamic powermanagement
scheme, the operating frequency
and voltage can be independently
manipulated to meet the performance requirements of the algorithm currently
being executed. Most Blackfin processors
offer on-chip core voltage-regulation circuitry
as well as operation to as low as 0.8
V, and they’re particularly well suited for
portable applications that require extended
battery life.
Blackfin processors come with a variety
of microcontroller-style peripherals,
including 10/100 Ethernet MAC, UARTs,
SPI, CAN controller, timers with pulsewidth-
modulation (PWM) support,
watchdog timers, real-time clock, and a
glueless synchronous and asynchronous
memory controller.
MULTICORE DSPs
A good example of a multicore DSP is Freescale’s
MSC8144 DSP, which is based
on the company’s StarCore technology–
specifically the third-generation SC3400
DSP core.
The chip incorporates four DSP subsystems.
Within each subsystem is an SC3400
DSP core, 16-kbyte L1 instruction cache,
32-kbyte L1 data cache, memory management
unit (MMU), extended programmable
interrupt controller (EPIC), and
two general-purpose 32-bit timers. The
subsystem has debug and profiling support
and low-power Wait and Stop processing
modes. Each DSP core runs at up to
1 GHz, so the chip delivers the equivalent
performance of a 4-GHz single-core DSP.
The MSC8144 also contains the company’s
QUICC Engine technology subsystem,
which includes dual RISC processors,
48-kbyte multi-master RAM, and
48-kbyte instruction RAM. This subsystem
supports three communication controllers
with one asynchronous transfer
mode (ATM) and two Gigabit Ethernet
interfaces. It can offload scheduling tasks
from the DSP cores as well.
The ATM controller supports UTOPIA
level II 8/16 bits at 25/50 MHz in
UTOPIA/POS mode with adaptation layer
support for AAL0, AAL2, and AAL5.
The two Ethernet controllers support
10/100/1000-Mbit/s operations via MII/
RMII/SMII/RGMII/SGMII and the
SGMII protocol using a four-pin serializer/
deserializer (SERDES) interface at a
1000-Mbit/s data rate only.
Like the DSP chips mentioned earlier,
this one surrounds the DSP and QUICC
subsystems with memory, interfaces, and
I/Os. As for memory, the chip contains
128-kbyte L2 shared instruction cache,
512-kbyte M2 memory for critical data
and temporary data buffering, 96-kbyte
boot ROM, and a whopping 10 Mbytes of
128-bit wide M3 memory.
DDR and DMA controllers also reside
on the chip. The DDR controller has up to
a 200-MHz clock (400-MHz data rate)
and a 16/32-bit data bus. It supports up
to 1 Gbyte of DDR1 and DDR2 in one
or two banks. The DMA controller has 16
bidirectional channels with up to 1024 buffer
descriptors and programmable priority,
buffer, and multiplexing configuration.
A chip-level arbitration and switching
system (CLASS) provides full fabric
non-blocking arbitration between the
processing elements (and other initiators)
and targets such as the M2 memory, DDR
SRAM controller, and device configuration
control and status registers.
The MSC8144 supports next-generation
and legacy interfaces, such as dual
Gigabit Ethernet, Serial RapidIO interconnect,
UTOPIA, PCI, and time-division
multiplexing (TDM).
The Serial RapidIO 1x/4x endpoint
corresponds to Specification 1.2 of the
RapidIO trade association. It supports
read, write, messages, doorbells, and maintenance
accesses in inbound mode and
messages and doorbells in outbound mode.
The PCI interface complies with PCI
specification revision 2.2 at 33 or 66 MHz
with access to all PCI address spaces.
Up to eight on-chip independent TDM
modules offer features like programmable
word size (2-, 4-, 8-, or 16-bit),
hardware-base A-law/µ-law conversion,
up to a 128-Mbit/s data rate for all channels,
with glueless interface to E1 or T1
framers, and the ability to interface with
H-MVIP/H.110 devices, TSI, and codecs
such as AC’97.
With its multicore architecture and
next-generation and legacy interfaces, the
MSC8144DSP is well-suited for highcapacity
infrastructure applications.
These include triple-play (voice, video,
and data) services, carrier class/enterprise
Voice over Internet Protocol (VoIP)
media gateway equipment, video-conferencing
equipment, and WCDMA and
WiMAX basestations.