RISE OF MULTIPROCESSING/ MULTITHREADING SHARPENS FOCUS ON INTERRUPTS
by William E. Lamie, Express Logic Inc.
Potentially substantial performance gains from the use of multithreading and multiprocessing architectures have captured the attention of designers of consumer devices and other electronic products. Multithreading uses cycles when the processor would otherwise sit idle to process instructions from other threads. Multiprocessing, on the other hand, introduces additional independent processing elements in order to execute threads or applications concurrently. Embedded applications running in multiprocessor and multithreading architectures, just like those running in conventional applications, require interrupt service routines (ISRs) to handle interrupts generated by external events.
One key challenge for designers implementing these new technologies is avoiding the situation where one thread is interrupted while modifying a critical data structure. As a result, a different thread is able to make other changes to the same structure. Conventional applications overcome this problem by briefly locking out interrupts while an ISR or system service modifies the crucial data structures.
In a multithreaded or multiprocessing application, this approach isn’t sufficient because of the potential for a switch to a different thread context (TC), or access by a different processing element that’s not impeded by the interrupt lockout. A more comprehensive approach is required, such as disabling multithreading or halting other processing elements while the data structure is being modified.
IMPROVING PERFORMANCE
Manufacturers of consumer
devices and other embedded
computing products are eagerly
adding new features, such as Wi-
Fi, VoIP, Bluetooth, and video.
Historically, increased feature sets
have been accommodated by
ramping up the processor’s clock
speed. In the embedded space,
this approach rapidly loses viability
because most devices are
already running up against
power consumption and realestate
constraints that limit additional
processor speed increases.
Cycle-speed increases drive exponentially
greater power consumption,
making high cycle speeds
unmanageable for more and
more embedded applications.
In addition, processors are already so much faster than memory that more than half the cycles in many applications are spent waiting while the cache line is refilled. Each time there’s a cache miss or another condition that requires off-chip memory access, the processor needs to load a cache line from memory, write those words into the cache, update the translation lookaside buffer (TLB), write the old cache line into memory, and resume the thread. MIPS Technologies stated that a high-end synthesizable core taking 25 cache miss plausible value for multimedia code) could be stalled more than 50% of the time if it must wait 50 cycles for a cache fill.
MULTITHREADING
APPROACH
Multithreading solves this
problem by using the cycles
that the processor would
otherwise waste while waiting
for memory access. It
can then handle multiple
concurrent threads of program
execution. When one
thread stalls waiting for
memory, another thread
immediately presents itself
to the processor. This helps
keep computing resources
fully occupied.
Notably, conventional processors can’t use this approach because it takes a large number of cycles to switch the TC from one to another. Multiple application threads must be immediately available and “ready-to-run” on a cycle-by-cycle basis for this approach to work. MIPS accommodates this requirement through its incorporation of multiple TCs, each of which can retain the context of a distinct application thread (Fig. 1).
In a multithreaded environment such as the MIPS 34K processor, performance can be substantially improved—when one thread waits for a memory access, another thread can use that processor cycle that would otherwise be wasted.
Figure 1 shows how multithreading can speed up an application. With just Thread0 running, only five out of 13 processor cycles are used for instruction execution and the rest are spent waiting for the word to be loaded into cache from memory. In this case, when using conventional processing, the efficiency is only 38%. Adding Thread1 makes it possible to use five additional processor cycles that were previously wasted. With 10 out of 13 processor cycles now used, efficiency improves to 77%, thus providing a 100% speedup over the base case. By adding Thread2, it becomes possible to fully load the processor. Instructions are able to be executed on 13 out of 13 cycles for 100% efficiency. All told, this represents a 263% speedup when compared to the base case.
MULTIPROCESSING
APPROACH
Multiprocessing, on the other
hand, combines multiple processing
units (each capable of
running a separate concurrent
thread) into a single system.
Often, they’re combined onto on
a single die, as is the case in
ARM’s MPCore multiprocessor.
In the MPCore’s symmetric multiprocessing (SMP) configuration, the individual processor cores are connected using a high-speed bus. They share memory and peripherals using a common bus interface. Generally, the SMP system runs a single instance of the real-time operating system (RTOS) that manages all “n” of the processor cores. The RTOS ensures that the n highest-priority threads can run at any time.
The primary software challenge in a multiprocessor system is partitioning the design and adding tasks. The primary hardware challenge is finding the right infrastructure to ensure high-bandwidth communications among processors, memory, and peripherals.
Continued on Page 2
An SMP system can be scaled by adding cores and peripheral devices to execute more tasks in parallel. In an ideal world, moving from one processor to n processors would increase the speed of the core by a factor of n. Generally speaking, such an approach allows multiprocessing to be quite scalable and often simplifies the design.
Intel states that it is more powerefficient to have multiple small cores each run individual threads than to have a single large processor run multiple threads. A multicore design also enables cores to share or duplicate processor resources, such as cache. The resulting efficiencies permit multicore designs to boost simultaneous performance without an increase in power.
IMPORTANCE OF INTERRUPTS
Interrupts are critical in a conventional
embedded application
because they provide the primary,
and in many cases, the only
means for switching from one
thread to another. Interrupts fulfill
exactly the same role in multithreading
and multiprocessing
applications as they do in a conventional
application. However,
there’s an important difference to
note: In a multithreaded or multiprocessing
application, changes
from one thread to another occur
not only through interrupts, but
also as a result of the system’s
ability to run multiple, independent
thread contexts concurrently
using spare CPU cycles or additional
processors.
It’s essential to avoid the situation where one thread is modifying a critical data structure, while a different thread is making other changes to the same structure.