Digital signal processors (DSPs) earn their living by doing
certain analog jobs better than analog circuitry. In some
cases, where analog circuits can’t even be considered for a task
due to cost or complexity reasons, DSPs are still a viable choice
and in many cases perform those tasks effortlessly.
That’s because DSPs are very good and very fast at arithmetic
operations such as addition and multiplication. Clever
mathematicians and engineers exploit this fact by creating
algorithms to tackle complex signal-processing tasks using
mainly those two mathematical operators.
Today’s DSP chips are much more than just a pretty processing
engine. Also integrated on these chips are memory
subsystems, high-speed interfaces, I/Os, and more. These elements
are included with the idea of increasing overall performance,
lowering power consumption, and targeting particular
processing tasks.
To better understand the various DSP chip options available
and how different parts of the device fit together as a whole, it’s
helpful to examine several representative DSPs on the market
today. We’ll take a look at examples of single-core, single-core
plus microcontroller, and multicore DSP chips.
SINGLE-CORE DSP CHIPS
It’s natural to think that DSP chips have a single DSP core.
Take, for instance, Texas Instruments’ TMS320C6452 (Fig. 1).
A member of the TMS320C64x+ family of high-performance
fixed-point DSPs, the chip targets process-intensive multichannel
telecom infrastructure and medical imaging systems. The
DSP core is just a part of the chip’s
design, though. The rest of the chip
comprises memory, I/Os, and other
functional blocks.
The C6452 DSP integrates onchip
memory organized as a twolevel
memory system. The level 1
(L1) program and data memories are
32 kbytes each. This memory can be
configured as mapped RAM, cache,
or some combination of the two.
When configured as cache, the L1
program (L1P) is a direct mapped
cache whereas L1 data (L1D) is a
two-way set associative cache. The
level 2 (L2) memory is shared
between program and data space. L2
memory can also be configured as
mapped RAM, cache, or some combination
of the two. Designers can
use the on-chip memory to add differentiating
features to their projects.
The C6452 also includes two Serial
Gigabit Media Independent Interface
(SGMII) Ethernet media access
control (MAC) ports and one gigabit
switch. The switch improves the efficiency
of multichip designs by automatically
monitoring the data stream
to ensure that only the appropriate TI added a decision gate to the switch
that can, for example, be used to distinguish
between voice and data traffic. If the
DSP is dedicated entirely to voice processing,
it can block data traffic from entering,
which makes much more effective use
of its processing bandwidth. In addition,
the device comes with two telecom serial
interface ports (TSIPs), providing a seamless
connection to common telecom serial
data streams.
Other I/Os on the C6452 include a
66-MHz PCI interface or Universal Host
Port Interface (UHPI); a double-data-rate
(DDR2) interface to external memory;
VLYNQ, a proprietary serial communications
interface developed by TI; a 16-bit
external memory interface (EMIFA); a
multichannel general-purpose audio serial
port (McASP); and other familiar interfaces.
Judging from this DSP’s I/Os, there’s
no doubt its home will be in telecom applications.
For other applications, a different
set of I/Os would be in order.
At the heart of the C6452 and several
other DSPs from Texas Instruments lies
the C64x mega module, which consists of
several components—the C64x+ processor,
L1 program and data memory controllers,
L2 memory controller, internal DMA
(IDMA), interrupt controller, power-down
controller, and external memory controller
(Fig. 2). The mega module also supports
memory protection for L1P, L1D, and L2
memories. It provides bandwidth management
for resources local to the mega
module as well.
The C64x+ processor on the module is
a very fast DSP that can operate at speeds
up to 1.2 GHz. It employs eight functional
units, two register files, and two data paths.
Two of these eight functional units are
multipliers or M units. Each M unit performs
four 16- by 16-bit multiply-accumulates
(MACs) every clock cycle.
Thus, eight 16- by 16-bit MACs can be
executed every cycle on the C64x+ core.
At a 1.2-GHz clock rate, 9600 16-bit MMACs can occur every second. Moreover,
each multiplier on the C64x+ core can
compute one 32- by 32-bit MAC or four
8- by 8-bit MACs every clock cycle. By the
way, the C6452 doesn’t operate at the fastest
speed, topping out at 900 MHz.
A new feature of the C64x+ processor has
the endearing name of the SPLOOP. This
small instruction buffer aids in the creation
of software pipelining loops where multiple
iterations of a loop are executed in parallel.
The SPLOOP buffer reduces the code size
associated with software pipelining.
DSP + MICROCONTROLLER CHIPS
Another class of DSPs employs an additional
microcontroller core on chip. Sometimes
this is a separate core, such as an
ARM processor. In other cases, the processor
core contains both DSP and MCU
functionality. This the case with the wellknown
Blackfin DSP architecture from
Analog Devices.
The Blackfin is based on a 10-stage
RISC MCU/DSP pipeline with a mixed
16/32-bit instruction set architecture, which
includes dual 16-bit MAC DSP instructions
and a 32-bit RISC-like instruction
set. This combination provides signal-processing
functionality with the ease-of-use
attributes associated with general-purpose
microcontrollers. The Blackfin processor
architecture is fully SIMD-compliant
(single-instruction, multiple-data) and
includes instructions for accelerated video
and image processing.
This combination of processing attributes
differentiates Blackfin processors
from their brethren. They’re designed to
perform equally well in both signal-processing
and control-processing applications,
in many cases eliminating the requirement
for separate heterogeneous processors in a
design. Blackfin processors offer up to 756
MHz in single-core products.
Continue to page 2