[Embedded in Electronic Design]
Multicore, Multithreaded Goes Embedded
William Wong
ED Online ID #19022
June 12, 2008
Copyright © 2006 Penton Media, Inc., All rights reserved. Printing of this document is for personal use only.
Reprints
MIPS Technologies combines multithreaded and multicore
support into its latest embedded SMP platform. As with most
multithreaded designs, the MIPS32 1004K’s multithreaded support
provides an incremental performance boost that is less than
adding another full core. Still, multithreaded support can take
advantage of a core’s idle time that would otherwise waste power—
a critical item in most embedded designs.
Each core can include one or two MIPS32-compliant Virtual
Processing Elements (VPEs). The multithreaded support can
deliver an additional 30% to 50% of a core’s base performance. This
provides a nice upgrade increment for designers that start with
a single-core solution, move up to a single-core, multithreaded
solution, and progress all the way up to a four-core, multithreaded
platform (see the figure).
Designers often are able to create a system that consumes less
power by running multiple cores at a slower speed. MIPS multithreading
support gives designers more flexibility. The architecture
itself gets part of its performance boost from the multithreaded
nine-stage pipeline. Designers can also mix and match floatingpoint
support.
CACHE SIMPLIFIES SMP
Each core contains a dual-port cache tag memory, allowing simultaneous
access by the VPEs (only one can access the cache at a time)
and the system’s coherence manager. This lets the coherence manager
operate in the background.
In addition, MIPS provides a number of configuration options
for the cache subsystem, such as the inclusion and size of translation
look-aside buffers (TLBs). Tuning the system can be critical
because cache miss percentages can have a major impact on system
performance. For example, the performance difference between a 0.8% and 4% cache miss ratio can be a
factor of 3. Of course, the application has
a major effect on this result. But determining
what tradeoffs to apply is just one of a
designer’s jobs.
The coherence manager handles the interaction
with the optional L2 cache accessed
via the 256-bit memory bus. MIPS also lets
designers move I/O coherence management
into hardware. This often is done in
software on other architectures, reducing
the performance that can be applied to the
application code. The cache system supports
L1 cache-to-cache transfers.
The global interrupt controller supports
system and interprocessor interrupts. System
interrupts can be routed to a specific
core. The MIPS32 1004K will be available
in the second quarter. It has a maximum
speed of 800 MHz. A typical two-core/
four-VPE system with 32-kbyte L1 caches
uses about 3.8 mm2.
MIPS TECHNOLOGIES • www.mips.com
|