• Channels
Part Inventory
Go
 
powered by:

 
  • Quick Poll
What Social Networking site do you use the most?



VOTE VIEW RESULTS
Previous Polls

Premium Content

New Signal Chain Technical Papers from Texas Instruments:

 

 

 

Programming To Survive Multicore


James Reinders

February 28, 2008

Print
Reprints Comment Subscribe

While many of us are just beginning to write parallel programs to use multicore, we can already see a crisis looming for those who use raw threads (p-threads or Windows threads) to implement parallelism. More abstract parallel-programming methods have tremendous advantages in allowing parallel programs to survive future processor designs.

And there’s a lot of future to survive. We soon will face a transition from multicore processors (eight cores or less) to tera-scale processors (more than eight cores). With dual-core processors nearing three years old, and eight-core processors coming later this year, this is a timely topic to consider.

Intel’s open-source Threading Building Blocks (TBB) project handles decomposition of a workload and load-balancing, so it isn’t hard-coded into our programs. Ultimately, TBB provides the appropriate level of abstraction that will help us survive the future with parallel programs written today.

Tera-scale computing
Dual-core processors from Intel arrived in 2005, with quad-core processors following in 2006 and second-generation (Hi-K in 45 nm) quad-cores in 2007. Octo-core processors from Intel are expected in late 2008. All of these devices are similar to program, and despite some differences in cache configurations, each processor has a certain number of cores all connected to the same memory.

As we move beyond eight cores, multicore gives way to tera-scale computing. Maybe it’s not exactly eight cores, but even if we categorize 16-core devices as multicore technology, we certainly should not expect 100 cores all connected to memory symmetrically and with the same memory access time. 

I say “certainly” because that’s the way it is with multiprocessor computer systems. There’s nothing to suggest that a multicore processor will overcome all of the connection challenges of prior multiprocessor systems.

Processors are going to change as they get more cores. We cannot expect to write parallel programs today that will run well in the future if we program them at a low level without abstraction. The good news is that we have choices.

Picking the right abstraction
We need to look for three things in a programming language for parallelism: scaling (add more cores, expect more performance), ease of debugging (avoid pesky deadlock and race conditions), and ease of coding and later maintenance.

Each is much easier to do using an abstraction for parallelism instead of raw threads (Windows threads or POSIX threads). Raw threads are the assembly language of parallel programming.

So if avoiding Windows threads and p-threads is the answer, what does that mean? For Fortran and some C programmers, it means OpenMP more than any other solution. For C++ and C programmers, it means TBB.

TBB outfits C++ for Parallelism
As the name suggests, Threading Building Blocks consist of multiple pieces that together extend C++ for parallelism. TBB supports scalable parallel programming using standard C++ code. It doesn’t require special languages or compilers. The ability to use TBB on virtually any processor or any operating system with any C++ compiler makes it very appealing.

Most parallel-programming packages haven’t been nearly as complete, offering only a few algorithms and missing the many other components that are required to fully address the needs of a real program. TBB includes seven very significant blocks to build upon:

  • Basic algorithms (parallel_for, parallel_reduce, parallel_scan)
  • Advanced algorithms (parallel_while, pipeline, parallel_sort)
  • Concurrent containers (to replace/augment STL, which is not thread-safe)
  • Scalable memory allocators (much better than new/delete, malloc/free)
  • Portable mutual exclusion mechanisms (MuTEXs and atomic operations)
  • Portable, fine-grained, global timers
  • Access to the TBB task-stealing scheduler application programming interfaces (APIs) directly to write your own templates for algorithms if you wish

Most of us will use all but the last block regularly in writing a parallel program with TBB. While most programmers start with just the basic algorithms, they quickly appreciate how TBB offers the other blocks. But why do all other parallel packages lack TBB’s completeness? There are at least four reasons.

First, not all packages try to be portable across operating systems and processors, which would eliminate the need for portable mutual exclusion. Second, not all advanced systems are willing to expose their proprietary schedulers. Third, scalable memory allocators are hard to write and have been sold for significant gains as standalone products before TBB. And fourth, getting key data structures (containers) worked out has been left as a non-trivial task for the programmer instead of the package writer.

Continue to page 2

Average (0 Ratings):

Subscribe
Subscribe to Electronic Design and start receiving more articles like this one
Filed Under:

Check for price and availability on Source ESB:

Go
powered by  
    There are no comments to display. Be the first one!
You must log on before posting a comment.

Are you a new visitor? Register Here
Acceptable Use Policy

Sponsored Links