Looking back over the past 10 years or so,
semiconductor process technology more or
less kept pace with the demand for functionality
in large-scale processor-based
ICs. When the next-generation set-top box
IC needed more horsepower, a move
from, say, a 180-nm process to 130 nm
would provide the necessary boost by adding
gates and the ability to run faster clocks. But that next-generation
chip would still carry a single processor.
Things have changed dramatically in the last few years. Simply
put, silicon scaling no longer meets functionality requirements.
Thus, designers turned to multiprocessor architectures,
which significantly up the ante in terms of processing power.
The number of processors per chip is taking off, already exemplified
several years ago by Cisco’s 192-processor engine for its
CRS-1 network router (Fig. 1).
With the rise in processing power and complexity comes a
host of issues that point largely toward the software side of the
system equation. Writing software for a single-processor system
is a relatively simple task, as a purely sequential approach
will do the trick. But there’s little point in multiple processing
engines if you’re not planning to have them execute instructions
in parallel. How parallelism is imposed is the crux of
the matter. Missteps can result in dire consequences, creating
debug nightmares.
Fortunately for those looking to move to multiprocessor
architectures in their system-on-a-chip (SoC) designs, tools
and methodologies are beginning to appear. Designers can take
steps to ensure that their parallelized application code won’t
cause memory-access deadlocks, race conditions, or other faults
that crash one or more processors or even their entire systems.
HOW MULTICORE LOOKS TODAY
Looking at a generic example of a multicore SoC can illustrate
both the complexity of the devices and the programming challenges
(Fig. 2). In a hypothetical transition from a 130-nm
SoC with a single processor to a multicore implementation
at 65 nm, designers would have roughly four times as many
transistors to work with.
Multicore architectures ramp up complexity in ways beyond
simply having multiple processors. The availability of more gates
brings added memory, which is required to handle the increasingly
large amounts of data, high-resolution video streams, and
other content. The increased bandwidth means more I/Os to deal
with all the data. More complex control processing is required by
a myriad of network stacks and more elaborate user interfaces.
“Designs are using more CPUs,” says Chris Rowen, CEO of
Tensilica. “But that
has only limited
potential because
of the way control
paths are written.”
When considering
multicore
SoCs, an important
distinction must be
made between control-plane and data-plane processing.
“In the data plane, there’s strong interest
in integrating more functionality,” says Rowen.
“Chips no longer process only audio, video, or
wireless baseband, but rather they process all of
them. Meanwhile, there’s growing complexity in
each of these various functions. This puts a lot of
pressure on a more programmable solution.”
Efforts to make the most of multiple processors
often run aground on the shoals of memory access
conflicts. “The old paradigm for multicore designs
using shared memory was if things were happening
in parallel, you’d want them to touch the
memory at different address spaces,” says Limor
Fix, general chair of the 45th Design Automation
Conference and associate director of Intel
Research Pittsburgh.
“The idea is for parallel threads not to interfere
with each other, and to minimize the number
of clocks required for the shared memory,”
says Fix. “If each of the parallel computations is touching a
different area in memory, there’s less collision and less locking
of the shared memory.”
The problem lies in the fact that visibility into the design is
extremely limited. “Typically, when working with RTL simulation
models of processors, software debug relies on the general-purpose
registers of the processor,” says Jim Kenney, product marketing manager
at Mentor Graphics. “These registers are usually exposed at the
top level for tracing in the waveform window of a logic simulator.”
Making matters worse is the fact that there may be only one
debug port for several processors. With all processors executing
instructions concurrently, it’s very difficult to control the speed of
any given processor.
Debugging is made even harder due to the absence of determinism.
“With multiple processors, you don’t control what’s running,”
says Michel Genard, vice president of marketing at Virtutech.
Rerunning code often is of no value because the results can be different
each time, making bugs hard to pin down. Then there’s the
notion of “Heisenbugs,” or changes introduced by probe insertion
that alter the system’s behavior.
GOING VIRTUAL
Fortunately, there are ways around these issues, most of which
come in the form of “virtualization” or “virtual platform” technology.
Many benefits can be derived from virtual platforms (see “Multicore Design Benefits from Virtual Prototyping,” www.electronicdesign.com, ED Online 18637).
Once a virtual platform is assembled from hardware models,
many of the issues concerning software debugging are addressed.
The designer gains a great deal of control over the system, hence a
return to a more deterministic scenario. The system configuration
is easily varied in terms of the number and speed of cores as well as
the software loads on each.
Virtual hardware offers a good amount of visibility in terms of
memory, processor registers, and device states. In addition, when
you synchronize the processors, you can synchronize everything at
once. It also affords much more control over system execution.
Continue to Page 2