Electronic Design

  
Reprints     Printer-Friendly    Email this Article    RSS        Font Size     What's This?


[Technology Report]
Parallel Programming Is Here To Stay
One size does not fit all, and it never will. Parallel programming looks to level the playing field by leveraging multicore hardware.

William Wong  |   ED Online ID #20655  |   February 26, 2009


Parallel Amplifier utilizes Intel’s Thread Profiler and VTune Performance Analyzers to provide runtime analysis that can help identify bottlenecks. These tools are designed to simplify the use of the profiler and VTune by regular programmers.

PUBLISH OR PERISH?
OpenMPI and OpenMP distribute data for processing by using arrays or communication links, but that isn’t the only mechanism employed in parallel-programming environments, especially those that are more dynamic and apt to change over time. Likewise, fixed buffers, links, and sockets don’t always address environments where the content is known ahead of time, while the suppliers/ publishers and consumers/subscribers are not. A number of implementations exist to facilitate this type of environment.

One is the Object Management Group’s (OMG) Data Distribution Service (DDS). Several commercial versions of DDS are available, such as Real Time Innovations’ RTI DDS, PrismTech’s OpenSplice DDS, and Twin Oaks Computing’s CoreDX. Open Computing’s OpenDDS is an open-source option. Open Computing provides training and support options.

DDS uses a publish/subscribe model familiar to many programmers, but it tends to be built on a much larger environment than a single system (Fig. 2). It has been used for applications ranging from air traffic control to industrial automation.

DDS provides a loosely coupled parallel computing environment. Individual publishers and subscribers are programmed in a conventional sequential programming fashion. Publishers identify the material they provide to the underlying DSS framework, which distributes the data as necessary to subscribers that request such information. In a simplified form, this is how the DDS system operates.

Things get a little more complex when examining the details, though, because options such as quality of service and connection reliability can affect application design. One thing DDS systems do better than most parallel-programming environments is handle transient connections, because they support best-effort delivery. In many applications, it’s sufficient to retain the latest piece of information. Still, DDS systems must deal with many of the scaling and complexity issues of any parallel-programming system.

Microsoft’s Concurrency and Coordination Runtime (CCR) and Decentralized Software Services (DSS) fit somewhere in between. CCR provides scheduling and synchronization within a subsystem. These tools were initially released with the Microsoft Robotics Studio, but have quickly moved to other .NET environments unrelated to robotics (see “Software Frameworks Tackle Load Distribution”).

CCR provides asynchronous and concurrent task management with an eye to coordination and failure handling. It uses its own message passing system. Ports and port sets are the endpoints for messages.

CCR is designed for more tightly integrated connections like OpenMPI. DSS, found on top of CCR, provides a lightweight, state-oriented service model that uses representational state transfer (REST), which is also used on a range of Internet communication. In fact, XML-based communication runs nicely over TCP/IP links, though this isn’t a requirement. The DSS Protocol (DSSP) uses the XML Simple Object Access Protocol (SOAP).

DSS has some publish/subscribe semantics. As a result, it can advertise the availability of a service or piece of information. It also can have any number of controllers utilizing input from realtime sensors.

TURNING GRAPHICS ON ITS SIDE
These parallel programming platforms target general-purpose processing architectures. However, the multicore GPUs found in most high-performance 3D video adapters from companies such as AMD, ATI, and Nvidia are also readily available.

The ATI Stream Processing and Nvidia GeForce and Tesla platforms allow the respective GPUs to find applications beyond just video rendering. Many of these applications are graphics-related. However, several others simply use the hundreds of cores in these GPUs for other computational purposes.

GPU architectures tend to be unique since they were designed for video rendering of 3D games, but they’re general enough to handle other chores. For example, Nvidia’s single-instruction multiple-thread (SIMT) architecture uses thread-processing arrays (TPAs) of eight cores. These cores are grouped in three TPA clusters called thread-processing clusters (TPC).

Nvidia developed a framework dubbed the Compute Unified Device Architecture (CUDA) to handle its SIMT-based GPUs (Fig. 3). CUDA support can be found in the company’s latest device drivers, so any PC equipped with one of its GPUs is a potential supercomputer—well, at least a little supercomputer. CUDA programs are written in C. Other programming languages like Fortran and C++ are also being added to the list.

CUDA hides much of the underlying complexity of the SIMT architecture. In fact, it’s been generalized so that it can address almost any memory-based multicore platform. CUDA now supports the Khronos Group’s OpenCL. The Khronos Group is a member-funded consortium that supports open standards such as OpenCL and OpenGL. OpenGL is a 2D and 3D graphics application programming interface (API).

Continue to page 3


<-- prev. page     1 [2] 3 4 5     next page -->

Reprints   Printer-Friendly  Email this Article  RSS    Font Size   What's This?



Reader Comments

You mention OpenDDS being from Open Computing. OCI is actually Object Computing Inc. located in St. Louis.

OpenDDS is available in C++ and with Java bindings. It is also available wrapped with JMS.

Malcolm Spence -March 25, 2009

POST YOUR COMMENTS HERE
Name:

Email:
Your Comments:

Enter the text from the image below


Please refresh the page if you have trouble reading this text.

Search Electronic Design
     
  
 
Web Seminar
Sponsored By:
Title: Read Pacing: A Performance Enhancing Feature of PCI Express Gen 2 Switch Devices
Speakers: 
Date: 07/01/08
Register: 

Electronic Design Europe Electronic Design China EEPN Power Electronics Auto Electronics Microwaves & RF
Mobile Dev & Design Schematics Find Power Products Military Electronics EE Events Related Resources