When it comes to multiprocessing, what’s good for the hardware
goose is not necessarily good for the software gander.
The ideal hardware architecture for a multicore design is a
heterogeneous (asymmetric) single instruction-set architecture
(ISA) that essentially includes both high- and low-complexity
cores to achieve lower power and higher throughput,
somewhat mitigating Amdahl’s Law1.
Now imagine that Amdahl’s Law (used to find a system’s
maximum expected improvement when only part of the system is improved) was of no concern
and we had unlimited die sizes. The ideal multicore from a programming perspective
would be homogeneous (symmetric), so dependence wouldn’t be built up on a specific ISA.
Courtesy of IBM, Sony, and Toshiba, the Cell microprocessor has a heterogeneous architecture—
though it isn’t a single ISA. Yet programming the device can be rather arduous,
leaving you with code that’s heavily architecture-dependent. According to Dave Haas, principal
architect at Raza Microelectronics, you should be careful not to pigeonhole yourself
into a given vendor or architecture when you can avoid it, making homogeneous architectures
a safer bet when given a choice.
Regardless of the best approach, there’s a limited number of options for
today’s embedded and general-purpose system designers. If you’re in the
embedded space, several of the multicore choices are heterogeneous. If
you live in a general-purpose world, you might only be able to get
a homogeneous multicore.
DECISIONS • When it comes to multiprocessing,
several tradeoffs exist that squeeze the most performance
out of your transistor (see the
table). For example, there’s the threadversus-
core tradeoff. According to
Kevin Kissell, MIPS principal
architect, you must start by
analyzing your system to determine
which applications can be
decomposed into a number of
constituent tasks or threads.
“Parallelization of monolithic applications is often possible,
but seldom easy, and it’s generally easier for a big scientific code
than a small embedded real-time application,” says Kissell. And
to save on area, consider utilizing a more thread-heavy architecture.
The idea is to maximize the performance per watt and
choose an architecture that will saturate the memory and power
envelope.
“To the extent that a single-threaded core cannot keep its
pipeline fully utilized because of delays from memory and slow
functional units, multithreading can extract throughput with a
relatively modest increase in area, and in many cases the payback
is superlinear,” he says.
For instance, you might achieve 30% more throughput for
15% more area in the CPU and cache subsystem. “This can be
converted into a power optimization if that recovery of lost
bandwidth allows the multithreaded core to run at a lower frequency
than an equivalent single-threaded core, and still meet
performance targets,” says Kissell.
So if your application doesn’t require significant amounts of
shared data or instructions, a distributed memory scheme is
probably the best candidate. “Each processing element’s memory
can be sized to its dedicated tasks,” Kissell says, “and one
can use different processor frequencies, different processor
models, and even different processor architectures for the different
processing elements to achieve the best area/power/performance
values.”
But if there’s an abundance of code and/or data sharing, a
symmetric configuration may be your best bet. According to
Kissell, this approach “adds complexity and loses a bit of peak
performance relative to a distributed memory model, because
there will be some contention for the shared memory array, and
because a cache-coherency protocol must be used among the
cores to ensure that they all see the same values at each memory
location, despite the presence of caches.”
But according to Chuck Moore, senior fellow for Advanced
Micro Devices, end users may have misaligned expectations
about multicore technology.
“Multicore is very good for throughput and responsiveness,
but given that most applications are still serial, these actually
won’t speed up on multicore,” says Moore. “Over time, there
will be an increasing number of parallel applications available,
but this is going to take more time than people seem to realize.”
DIFFERENT VIEWS • When it comes to multiprocessing, all
“coaches” believe their team has the best strategy for winning
(see “Multicore My Way” at www.electronicdesign.com, ED
Online 14631). Take AMD and Intel, which have gone public
about their opposite approaches to next-generation cores. Intel
believes homogeneous cores are the way to go, while AMD
believes the future lies in heterogeneous cores.
“Multicore solutions of tomorrow will be heterogeneous,”
says AMD’s Moore. “They will initially involve the use of
architecturally compatible cores with varying capabilities, but
will grow to include more special-purpose and power-efficient
hardware that is accessed through well-defined APIs (application
programming interfaces).”
Intel and Vivace Semiconductor also have radically different
views of the embedded space. “Intel’s Embedded and Communications
Group estimates the percentage of multicore designs
that will utilize asymmetric multiprocessing (AMP) in the next
three to four years of all Embedded and Communications
Group-deployed multicore platforms to be about 10%,” says
Edwin Verplanke, platform solution architect with Intel’s
Embedded and Communications Group.
Continue to next page