Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Technology Report]
Parallel Programming Is Here To Stay
One size does not fit all, and it never will. Parallel programming looks to level the playing field by leveraging multicore hardware.

William Wong  |   ED Online ID #20655  |   February 26, 2009


Open Computing Language (OpenCL) is a standard for parallel programming that supports, but is not restricted to, GPUs. It even supports IBM’s Cell processor (see “CELL Processor Gets Ready To Entertain The Masses) found in Sony’s PlayStation 3 and DSPs.

OpenCL can handle a heterogeneous environment. Therefore, a mix of x86 chips, GPUs, and DSPs could merrily crunch on loads of data. It has garnered wide support, so this scenario is actually feasible. It can even fit on mobile platforms.

Also, OpenCL has a platform model with a controlling host and multiple compute units. The compute units execute kernels, which are small chunks of code. This model is seen elsewhere with Nvidia’s SIMT architecture as well as Intel’s TBB.

Further, OpenCL uses a relaxed memory consistency model. It doesn’t guarantee consistency of common variables across a collection of workgroup items, unlike an SMP system, where a variable has one location that’s equally accessible by any core. This is because many of the target platforms feature distributed memory with a core often having its own local memory.

OpenCL puts some limitations on the programming model. For example, pointers to functions aren’t allowed. Data pointers within a kernel block are allowed, but they may not be an argument. The restrictions make it possible to transparently map the application to the wide range of architectures supported by OpenCL.

PARALLEL ARCHITECTURES
Frameworks like OpenCL are likely to be adopted to support new hardware architectures. But initially, vendor-provided programming tools will often be the first step. Likewise, some architectures work best when the developer can exploit features within the architecture through the programming tools designed to work in the architecture.

One such example is Forth programming support for the 40-core Intellasys SEAforth 40C18. Each core has only 512 words of RAM and ROM. Each 18-bit word contains four instructions. Unlike some other multicore solutions, the SEAforth cores aren’t designed to run one large program. Instead, they run very small, cooperative programs. In fact, three cores can be used to handle the dynamic RAM interface.

The XMOS XS1-G4 has hardware scheduling of up to eight tasks per core with four cores per chip. The hardware scheduling makes it easy to write drivers for soft peripherals or handle the hard interfaces such as 32 XLink channels. These are used for communication between cores and chips.

Channel communication is so ingrained in the system that the XC compiler, an extended version of C, brings channels into the base language. Communication is explicit, but XMOS uses a basic part of the approach for parallel programming.

PARALLEL LANGUAGES
Parallel programming on SMP architectures deals with virtual memory, pointers, and multithreading facilities that have been commonly used for decades using languages like C, C++, and Java. Network cluster programming using TCP/IP and sockets has also been prevalent.

These programming techniques can be used in many core environments. However, explicit control and communication can make programming tasks in these environments difficult as the number of cores increases. One area in which many cores make fast work is array computation.

Programming languages like the Mathworks’ Matlab offer matrix manipulation support. Many matrix computations map very well to a range of hardware architectures, though some architectures handle some operations better than others. For example, SMP architectures in which cores have simultaneous access to all memory can easily handle random access operations, versus architectures with just local memory.

These architectures have a high latency for accessing information that isn’t local, making operations like matrix inversion a challenge. This is one reason why GPUs and clusters of cores can handle some algorithms exceptionally well while others will work very poorly.

Matlab’s array-processing support is something any runtime can provide. So while this approach is applicable to any programming language, it only addresses some parallel-programming chores. For other chores, there’s the Parallel Computing Toolbox.

The Parallel Computing Toolbox adds features such as parallel for loops, distributed arrays, and message-passing funcintersiltions. Message-passing functions address MPIstyle programming, but the other features highlight the deficiencies of conventional programming languages. Adding these types of parallel computing services illustrates how programming languages are changing.

Continue to page 4


<-- prev. page     1 2 [3] 4 5     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?



Reader Comments

You mention OpenDDS being from Open Computing. OCI is actually Object Computing Inc. located in St. Louis.

OpenDDS is available in C++ and with Java bindings. It is also available wrapped with JMS.

Malcolm Spence -March 25, 2009

POST YOUR COMMENTS HERE
Name:

Email:
Your Comments:

Enter the text from the image below


Please refresh the page if you have trouble reading this text.

Search Electronic Design
     
  
 
Web Seminar
Sponsored By:
Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
Speakers: 
Date: 07/01/08
Register: 

Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources