[Lab Bench]
What Will You Do With 1 TFLOP Of Double-Precision Power?
William Wong
ED Online ID #19324
July 24, 2008
Copyright © 2006 Penton Media, Inc., All rights reserved. Printing of this document is for personal use only.
Reprints
Don’t look now, but you may have a supercomputer
on your desk. It’s hiding in your
video card. While it won’t make
your word processor faster, it
may improve the transcoding speed when you’re
moving movies to your mobile Internet device.
Intel and AMD have been pushing multicore
in the 64-bit x86 realm with only four-core chips at this
point. Intel’s 80-core Polaris is designed to push the envelope,
but AMD and NVidia have other ideas, at least when it
comes to stream computing.
Multicore has flourished in graphics processing units
(GPUs). Until a few years ago, GPUs literally were black-box
systems designed to improve gaming and deliver fast updates
for CAD packages and medical applications. The closest a programmer
got to the GPU was the video device driver.
That was then. Now, NVidia and AMD/ATI not only have
opened up their precious GPU, they also have delivered an
impressive collection of software and application programming
interfaces (APIs). We’re now into third-generation boards targeted
specifically at areas including stream computing.
NVidia’s C1060, designed for parallel computing, lacks a
video output (Fig. 1). Still, the board often will be used for
video preprocessing chores such as image analysis and ray tracing
with another video card providing rendering services.
The C1060 uses the same architecture
as NVidia’s GForce video
adapters and packs 4 Gbytes of
memory for its 240 cores. It also uses
the same SIMT architecture as the
GForce, just with many more cores.
And, the C1060 does double-precision
while the GForce products are
single-precision, for now.
AMD’s FireStream 9250 is
based on the company’s double-precision
RV770 chip, which also is found in
AMD’s Radeon HD 4850 (Fig. 2). It has 160
cores that normally are used as shaders when tasked
with graphic chores.
SOFTWARE SUPPORT
These latest boards target high-performance computing applications,
though the software used to create applications is
equally applicable to GPUs in video boards. While the video
boards may have to perform double duty by running a parallel
application and displaying a windowed desktop, the amount of
performance available is often sufficient to handle both.
The first step was to provide runtime libraries that delivered
array manipulation services. Yet the real power came when programmers
were able to write applications than ran on the GPU.
NVidia’s Compute Unified Device Architecture (CUDA) and
AMD’s FireStream software development kit (SDK) can do
this, and they’re available as free downloads. A forthcoming
version of CUDA will even generate code that runs on non-
GPU platforms such as multicore
x86 processors.
The C code used with these
GPU tools is augmented to explicitly
annotate the parallel aspects of
the programs. Developers will need
to
try out this approach, and not all
applications can benefit from the
tools and GPUs.
|