Parallel programming
is hard. But debugging
it is even harder.
Unfortunately, taking
advantage of multicore
solutions like Intel’s
80-core TeraScale
prototype will require
some type of parallel-programming technique
(Fig. 1).
The first challenge is to find parallelism
that can be exploited. The next is using a
tool to exploit the parallelism. Another goal
is bug-free code. Parallel programming
opens the door to a range of more complex
bugs, though, and time becomes even more
critical. Finally, there’s the issue of targeting
the host platform with these tools.
At this point, generic solutions don’t exist
because of the range of multicore hardware. Tools primarily
target only one class of hardware or even one vendor’s hardware.
Programmers typically push these jobs off to the operating
system or runtime. Eventually, though, parallel-programming
constructs will make it into mainstream programming languages.
Either way, developers will need multicore solutions
to take advantage of performance improvements, since singlecore
scaling is no longer an option in pushing the limits.
LET THE OPERATING SYSTEM DO IT
Pushing the job of managing coarse-grain parallelism onto the
operating system is a common task and easy to do. It works
well if there’s a large number of programs, or if those programs
are taking advantage of multiple cores. This requires no modification
of the applications, but it’s of less value if there isn’t
enough programs to exploit the hardware.
Server environments typically can have program loads
that use the target hardware. Likewise, embedded application
designers can latch onto virtual-machine (VM) products
like Trango’s Hypervisor, Green Hills Software’s Integrity,
VmWare’s namesake, and KVM or Xen on Linux to manage
multicore solutions. These tools allow for better management
and debugging of programs and systems in addition to providing
features like load leveling.
VM architectures potentially open up other avenues for
programmers. Thin operating systems or programs running
alone in a
VM may be given access to
features previously restricted to the
operating system, such as virtual
memory
management and peripheral
access.
Virtual memory management could
enable programmers to manage memory
and interprocess and intra-application
communication more effectively. For multicore
utilization, communication is key to
good use of the
system. The big question is
whether programming languages or runtimes
will take
this approach.
LET THE RUNTIME DO IT
After VMs, runtimes are the most common
method for exploiting multicore environments.
Platforms like Intel’s Threading
Building Blocks (TBB) require developers
to explicitly use exposed function calls to utilize the runtime.
This approach forces developers to determine the type and
utilization of parallelism in an application and meld it with the
runtime. In turn, the runtime will also need to manage parallelism.
The functional interface can help narrow the scope for
finding parallelism that may put the onus on the programmer
to use the right function.
Usually, the interface is implemented to the runtime strictly
through function or class definitions, though customizing a compiler
offers advantages as well. TBB employs a typical interface,
much like the following definition for the parallel_do function:
template<typename InputIterator, typename Body>
void parallel_do( InputIterator first, InputIterator last,
Body body );
In general, parallel processing deals with data or control parallelism.
The above definition takes advantage of TBB’s C++ support
and C++ templates. Specifically, TBB addresses data parallelism
over large data sets, such as matrices or streams of data.
Microsoft’s Concurrency and Coordination Runtime
(CCR) (see “Software Frameworks Tackle Load Distribution”
at www.elecronicdesign.com, ED Online 18813), which was
released with Microsoft’s Robotics Studio (see “MS Robotics
Studio,” ED Online 16631), also uses a functional interface and
addresses control parallelism. In this case, CCR helps optimize asynchronous communication between threads that may be distributed
among multicore platforms or even across networks.
As with any runtime, programmers must account for a mindset
and an underlying architecture. They work with it all the
time, since applications rarely are completely standalone or written
solely by a single programmer. Consequently, there’s at least
some level of black-box isolation within an application. On the
other hand, complex frameworks like TBB or CCR require a good
understanding of the underlying architecture.
Continue on Page 2