Premium Content

New Signal Chain Resources from Texas Instruments:

Software Frameworks Tackle Load Distribution

Multiple-core designs can go in several directions when it comes to distributing the load

Date Posted: January 31, 2008 12:00 AM
Author: William Wong

The system supports sparse matrices, including the ability to transpose result data. Likewise, it lets developers write applications for the SPE that do not have to deal with the memory- management complexities of the Cell architecture. Furthermore, Cell processor systems deal with stream video processing—another area where NVidia’s approach is used, but the chip architecture is much different.

NVidia’s CUDA also is tasked with optimizing a developer’s time in creating applications that take advantage of a GPU. CUDA includes a native C compiler along with FFT and BLAS libraries, a profiler, and a gdb debugger, as well as a host runtime driver. The adapters typically plug into a 16x PCI Express slot.

Sample applications address operations such as image convolution, binomial option pricing, and Sobel edge detection. CUDA provides host support for Fortran and has a MathWorks Matlab plug-in. It works with the Tesla adapters in addition to a number of NVidia graphics adapters.

CUDA has a similar memory-management issue like the Cell processors, but the interaction is more flexible. The complexity comes in the type of processors, the data flow, and their management. The chips were designed for graphics processing, so they are amenable to streaming algorithms of this type. But the architecture can handle more general algorithms as well.

Because of the limitations, though, CUDA addresses thread and memory management. For example, a fast shared memory region typically can be used for texture lookups in graphics applications. But it also can be used for general communication among threads. The architecture uses a SIMD model so groups of 32 threads, called thread blocks, run simultaneously, executing the same code but on different data.

The CUDA framework’s host processor is physically separated from the computational elements, unlike the Cell, which incorporates the PPE on the same chip as the SPEs. This means that additional worker cores can be added in the CUDA approach by adding more adapter boards.

It also means the framework must account for this capability.

AMD also has its ATI line of graphics cards and a FireStream GPU adapter on par with NVidia’s Tesla, so it is not surprising to find a similar framework to support a similar streaming type of architecture. Likewise, its video gaming roots are apparent in some target applications such as physics simulation support for games.

AMD calls its framework the Stream Computing Software Stack. Its native development tool is based on the Brook C/ C++ compiler. The Brook compiler incorporates data parallel computing ideas and targets a range of platforms. It also adds new data types such as streams plus scatter/ gather operations. The AMD incarnation addresses the BrookGPU subset that accounts for the advantages and limitations of a GPU like the FireStream.

Frameworks such as those mentioned in this article will be critical in taking advantage of multicore platforms, whether they’re heterogeneous or homogeneous, because they remove much of the management complexity of the underlying system from the programmer’s perspective.

NEED MORE INFORMATION
AMD
Brook Language
IBM
Mathworks
Mercury Computer System
Message Passing Interface Forum
Microsoft
NVidia
Object Management Group
PrismTech
Real-Time Innovations

 

multicore
Part Inventory
Go
powered by:
 

 
You must log on before posting a comment.

Are you a new visitor? Register Here
    There are no comments to display. Be the first one!