Premium Content

New Signal Chain Resources from Texas Instruments:

Get Ready For Some Hard Work With Multicore Programming

Date Posted: July 08, 2011 01:26 PM
Author: William Wong

Clusters with thousands of cores are standard fare these days. Chips with hundreds of cores can be found in the average desktop and laptop PC in the form of GPUs that are now being used for computational chores in addition to their graphical work.

Programming the current crop of server, desktop, and laptop systems is relatively straightforward. However, things get a bit more challenging as the number of cores increases by an order of one or two magnitudes. For example, servers may consist of a few multicore chips providing a symmetrical multiprocessing (SMP) system with dozens of cores supporting virtual machines (VM) allowing a system to run hundreds of VMs.

In the past, a high-speed bus or switch system would link multiple cores within a chip. But the latest architectures tend to take a non-uniform memory architecture (NUMA) approach. Chips typically have a significant amount of on-chip memory often in the form of L1, L2, and possibly L3 caches. Intel, AMD, and other companies have incorporated multiple memory controllers alongside the cores as well.

On-chip cores have fast access to their L1 caches with longer latencies as data is located farther from the core. Off-chip access requests are routed through the chip interconnect. Intel and AMD utilize multiple point-to-point connections between chips and route requests through adjacent chips if they do not have the desired information.

Unfortunately, moving to a very large number of cores requires different approaches, such as a mesh or a more general network/cluster technique. Network interfaces like Serial Rapid IO (SRIO), InfiniBand, and Ethernet have been used to implement clusters. Message packets handle communication. Latency tends to be high compared to on-chip communication.

SRIO has the lowest overhead and smallest packets, and it provides end-to-end handshaking. InfiniBand tends to handle larger messages and has been the choice for many supercomputer clusters. Ethernet has significant overhead and latency issues, but it is ubiquitous and the backbone for the Internet. All three address off-chip communication, but on-chip the designs are more varied.

C | C++ | Haskell | Java | MPI | multicore | multiprogramming | OpenMP | parallel programming | Thread Building Blocks
Part Inventory
Go
powered by:
 

 
You must log on before posting a comment.

Are you a new visitor? Register Here
    There are no comments to display. Be the first one!