The system supports sparse matrices,
including the ability to transpose result
data. Likewise, it lets developers write
applications for
the SPE that do
not have to deal
with the memory-
management
complexities of the
Cell architecture.
Furthermore, Cell processor
systems deal with stream
video processing—another area where
NVidia’s approach is used, but the chip
architecture is much different.
NVidia’s CUDA also is tasked with
optimizing a developer’s time in creating
applications that take advantage of a GPU.
CUDA includes a native C compiler along
with FFT and BLAS libraries, a profiler,
and a gdb debugger, as well as a host runtime
driver. The adapters typically plug
into a 16x PCI Express slot.
Sample applications address operations
such as image convolution, binomial option
pricing, and Sobel edge detection. CUDA
provides host support for Fortran and has a
MathWorks Matlab plug-in. It works with
the Tesla adapters in addition to a number
of NVidia graphics adapters.
CUDA has a similar memory-management
issue like the Cell processors, but the
interaction is more flexible. The complexity
comes in the type of processors, the data
flow, and their management. The chips
were designed for graphics processing, so
they are amenable to streaming algorithms
of this type. But the architecture can handle
more general algorithms as well.
Because of the limitations, though,
CUDA addresses thread and memory
management. For example,
a fast shared memory
region typically can be
used for texture lookups in
graphics applications. But it
also can be used for general communication
among threads. The architecture
uses a SIMD model so groups
of 32 threads, called thread blocks, run
simultaneously, executing the same code
but on different data.
The CUDA framework’s host processor
is physically separated from the computational
elements, unlike the Cell, which
incorporates the PPE on the same chip
as the SPEs. This means that additional
worker cores can be added in the CUDA
approach by adding more adapter boards.
It also means the framework must account
for this capability.
AMD also has its ATI line of graphics
cards and a FireStream GPU adapter on
par with NVidia’s Tesla, so it is not surprising
to find a similar framework to support
a similar streaming type of architecture.
Likewise, its video gaming roots are
apparent in some target applications such
as physics simulation support for games.
AMD calls its framework the Stream
Computing Software Stack. Its native
development tool is based on the Brook C/
C++ compiler. The Brook compiler incorporates
data parallel computing ideas and
targets a range of platforms. It also adds
new data types such as streams plus scatter/
gather operations. The AMD incarnation
addresses the BrookGPU subset that
accounts for the advantages and limitations
of a GPU like the FireStream.
Frameworks such as those mentioned in
this article will be critical in taking advantage
of multicore platforms, whether
they’re heterogeneous or homogeneous,
because they remove much of the management
complexity of the underlying system
from the programmer’s perspective.
NEED MORE INFORMATION |
| AMD |
|
| Brook Language |
|
| IBM |
|
| Mathworks |
|
| Mercury Computer System |
|
| Message Passing Interface Forum |
|
| Microsoft |
|
| NVidia |
|
| Object Management Group |
|
| PrismTech |
|
| Real-Time Innovations |
|