Ray tracing is the reason animated films
look so good these days and why
games still have a way to go when it
comes to realism, even with powerful
graphics processing units (GPUs). The problem
for gamers is that GPUs are rasterization engines,
and most games are tuned for that. Cinema-quality
animated movies tend to be created using large “render
farms” that are network clusters.
Caustic Graphics is looking to literally change the landscape
by replacing the render farms with workstations running
its CausticTwo boards. The faster CausticTwo boards,
which are replacing the CausticOne boards, work in conjunction
with a GPU to optimize the rendering of ray-traced
images (Fig. 1). Essentially, they convert a 3D environment
created with a 3D content creation tool like Blender into a 2D
rendition. They are designed to accelerate the processing
of effects such as soft shadows, glossy reflections, and, of
course, caustics (reflected and refracted light).
Typically, high-resolution rendering took hours or days
depending upon the size of the video files and the level of
quality. Caustic Graphics looks to turn this into a real-time
process, significantly changing how artists approach the
problem. This is akin to the changes in 3D CAD when workstation
GPU boards made real-time 3D manipulation a reality.
Accelerated ray-traced rendering will help in this environment
as well since realistic presentations of objects or architectural
designs are common requirements.
The ray-trace optimization initially targeted higher-performance
floating point since that was a major issue in ray-trace
computations. Multicore solutions like GPUs can bring massive
floating-point processing to bear, but there is a problem.
Most GPUs and even CPUs like to work using single-instruction
multiple-data (SIMD) operations. GPU systems like Nvidia’s
Tesla are built on blocks of eight and scale up from there
(see “SIMT Architecture Delivers Double-Precision Teraflops”
at www.electronicdesign.com, ED Online 19280). The Nvidia
GPU has 240 cores.
This basic SIMD approach works well for rasterizing 3D
models where the kinds of computations are going to be
the same, and this holds true for the initial step from a light
source in a ray-tracing algorithm. The problem occurs when
the light hits an object and is dispersed in different directions.
This changes the calculations since each vector
is processed differently. Caustic Graphics finds
similar calculations and groups them together so GPU
engines can operate on them in parallel. It allows tens of
thousands of rays to be processed efficiently in parallel.
THE DATABASE ENGINE
Caustic Graphics has developed extensions to OpenGL
ES 2.0 to address its approach to ray tracing. A typical configuration
incorporates a CPU, a GPU, and a CausticTwo
board
(Fig. 2). An application interfaces with an
Caustic Grahpics OpenGL implementation called
CausticGL that in turn manages the CausticTwo
board and the GPU through the GPU’s standard
OpenGL driver. The coordination occurs in the CPU.
The CausticGL driver can fall back to a software implementation
if the CausticTwo hardware is not available, but it
will be slower.
The CausticTwo board handles the ray intersection tests
and schedules the rays that have locality of reference in 3D
space to enable the efficient shading of a ray’s color information
by the GPU. The GPU is shading while the CausticTwo
performs ray intersection tests, other database queries, and
scheduling.
The GPU is used for the final rendering and can handle
multiple frames. This is typically done for real-time rendering
to provide double buffering. But only a single frame can be
done for offline rendering, potentially adding more computational
resources for the single frame, speeding up the process
or allowing a higher-quality rendition.
The GPU includes environment-related information to
assist in the rendering. Likewise, some of this information is
replicated in the database processing unit (DBPU). This lets
the driver query the database and determine what calculations
can be grouped together and performed on the GPU.
In turn, this permits the GPU to perform ray-tracing support
operations with rasterization efficiency that would otherwise
be impossible.
Essentially, Caustic Graphics has created a sophisticated
content addressable memory. It can handle multiple queries
in parallel and support multiple databases. In this case, it
would have one database per frame. It also can handle raytracing
style queries such as ray queries and photon
kNN (k nearest neighbors).
The system has been optimized
for its target application where
a data-defining frame
is fixed with lots
of queries against it. The database does
not change after it is set up,
and it is discarded when the
process is done. The drivers
and hardware can handle
more than one database
being active at a time, so
double buffering is part of the
equation making real-time
rendering possible.
The CausticTwo board has
a 16x PCI Express interface.
Multiple boards in a system
are supported, as different
boards could handle the
queries on the same database
if the databases are
replicated on each board.
Another approach would have
different boards handle different
frames. The bottom line
is to keep all processing units
active. Idle units do not accelerate
anything.
MORE THAN A RAY OF TRUTH
Caustic Graphics is targeting
the high-end graphicsrendering
market, which will
keep the company’s hands
full. So, it’s unlikely that other
applications will be able to
take advantage of the database
engine in the near term.
Still, some academic research
will occur, and some determined
developers may be
able to take advantage of the
CausticTwo’s performance.
Applications that could take
advantage of the CausticTwo
include those where database
queries do not change the
database itself. This parallels
the opening of the GPU
in recent years where GPUs
changed from black boxes
behind a driver into more
generic supercomputer platforms.
In the meantime, a
horde of media moguls is lusting
after the CausticTwo.
BLENDER
CAUSTIC GRAPHICS