View this week's entry ad »
Part Inventory
powered by:
Part Finder
Go
powered by:
  • Quick Poll
What Social Networking site do you use the most?



VOTE VIEW RESULTS
Previous Polls
Hotspots » Analog & Mixed SignalPowerEmbedded

Premium Content

Editors' Picks

Featured Industry Resources

Parallel Programming Is Here To Stay

Highlights

  • SMP Multicore solutions
  • OpenMP portable, scalable framework
  • Non-uniform memory access (NUMA)

One size does not fit all, and it never will. Parallel programming looks to level the playing field by leveraging multicore hardware.

By William Wong

February 26, 2009

Print
Reprints Comment Subscribe

It was easy to program applications in the days when one chip, one core were common. Single-chip solutions remain the target of many systems, especially for mobile applications. But these days, they’re likely to include more than one processing core. Programming these platforms can be a challenge.

High-end server platforms like Intel’s six-core Xeon 7460 use lots of transistors for very large, complex architectures. Systems with even more cores on a single chip are readily available as well. Chips like the 40-core Intellasys SEAforth 40C18, the 64-core Tilera TilePro64, and the 336-core Ambric AM2045 are just the beginning (see “Are You Migrating To Massive Multicore Yet?).

Many PCs already include high-count multicore chips in the form of graphic processing units (GPUs). They’re now being made accessible for general computing and formalized with platforms like Nvidia’s 240-core Tesla C1060 (see “SIMT Architecture Delivers Double-Precision TeraFLOPS).

Multicore solutions are on the rise because it’s becoming harder to scale single-core processors while trying to maintain the heat and power envelope necessary to make systems practical. Multicore is no longer a scaling issue, but rather a requirement to meet growing performance requirements.

Clock speed and core count don’t tell the whole story, though. Core interconnects constitute the real programming challenge. Many multicore chips don’t employ the shared-memory approach found in symmetrical-multiprocessing (SMP) platforms like the Xeon, where multithreaded applications can typically exist without regard to the number of underlying cores.

Non-uniform memory access (NUMA) architectures maintain the SMP approach. However, scaling to large numbers of cores can be difficult. For instance, the TilePro64 manages with 64 cores on-chip (Fig. 1).

Still, this is one reason why other approaches, such as mesh networks, are employed when cores start numbering into the hundreds or thousands. This allows designers to throw lots of hardware at a problem, though it requires a different approach to programming.

DISTRIBUTED COMPUTING FRAMEWORKS
The OpenMP portable, scalable framework supports multiplatform, shared-memory parallel programming and targets SMP systems. It also supports C/C++ and Fortran and runs on popular platforms such as Linux and Windows. OpenMP is a thread-oriented approach that maps well to existing hardware architectures. Its core elements include thread management, synchronization, and parallel control structures.

The message-passing interface (MPI) standard, maintained at Argonne National Laboratory, can operate on SMP hardware and also span various networks. Several operating systems are based on message-passing communication.

OpenMPI is an open-source implementation of the MPI-2 standard. It can operate over a range of communication systems such as TCP/IP, Myrinet, and most communication fabrics found on multicore processors. OpenMPI also can be mixed with OpenMP.

Intel’s Thread Building Blocks (TBB) are another SMP-oriented framework compatible with OpenMP (see “Multiple Threads Make Chunk Change). TBB is available as an open-source project as well. Like its name says, TBB is threadoriented, but it tends to utilize one thread per core. Each worker thread gets its work from a job queue. The application feeds the job queues.

TBB extends C and C++ using a limited number of keywords to designate blocks of code that can be performed in parallel. The same is true for data definitions that the parallel code will be working with. These blocks are typically arrays. The data and the processing jobs can be spread across the collection of cores via the worker threads. The queues may fill, but the idea is to keep the cores working instead of idle.

Built around TBB, Intel’s Parallel Studio includes Parallel Advisor (design), Composer (coding), Inspector (debug), and Amplifier (tuning). Parallel Advisor is a static analysis tool designed to identify sections of code in which TBB support will make a difference. It can also identify conflicts and suggest resolutions of these issues. This tool is especially useful for designers who are new to TBB.

Parallel Composer now brings TBB integration to platforms like Microsoft’s Visual Studio. It handles new lambda function support and is compatible with OpenMP 3.0. Parallel debugging support is also part of the package. Its “parallel Lint” capability helps identify coding errors.

Parallel Inspector is a proactive bug finder designed to augment the typical program debugger. It identifies the root cause of defects such as data race conditions and deadlock. The tool can also be used to monitor system behavior and integrity. The system is based on Intel’s Thread Checker tool.

Continue to page 2

Average ( Ratings):
Filed Under:

Check for price and availability on Source ESB:

Go
powered by  

Related Products

comments
Add A Comment\(Log in or create an account\)
  • March 25, 2009 02:24 PM

    by Malcolm Spence

    You mention OpenDDS being from Open Computing. OCI is actually Object Computing Inc. located in St. Louis.

    OpenDDS is available in C++ and with Java bindings. It is also available wrapped with JMS.

You must log on before posting a comment.

Are you a new visitor? Register Now

Acceptable Use Policy

Sponsored Links