AccelerEyes Talks About Matlab GPU Acceleration

1 of Enlarge image

/content/content/62196/62196_fig1.jpgThe Jacket runtime enables prototyping in MATLAB deployment with a standalone C/C++ library

Matlab, from The Mathworks, is one of the most used analytical prototyping and development tools around. Its support for matrix operations lends it to acceleration using GPUs. Recently they announce that support (see Mathworks Matlab GPU Q and A) but AccelerEyes was really the first to deliver this support. AccelerEyes worked with The Mathworks to develop GPU support.

I recently spoke with John Melonakos, CEO and Co-Founder of AccelerEyes, about their work with Matlab and GPUs. Their Jacket and libjacket products have been in use by a wide range of developers for many years.

Wong: The Mathworks has added GPU support recently but you have been doing it for a while. Can you tell me about how that started?

Melonakos: We started AccelerEyes in 2007 because we saw big potential for GPUs to accelerate core scientific, engineering, and financial analytics problems. This was before CUDA. We started with traditional OpenGL GPGPU shader programming. That was tough. In 2008, we moved to CUDA to benefit from key libraries like FFT and BLAS. That same year, we began a partnership with MathWorks to accelerate MATLAB as a member of their 3rd Party Connections Program.

Unfortunately, last year that partnership ended when MathWorks decided to go it alone by building their own CUDA routines into their legacy Java system. We are much fonder of the approach that we originally invented with Jacket based upon a fast C-runtime. The Jacket runtime optimally converts high-level code into fast GPU kernels, yielding much better performance. This relies upon our just-in-time compilation and lazy evaluation, which are core Jacket innovations and the reason why Jacket is so fast and prevalent today.

Wong: What GPUs can be utilized by the system?

Melonakos: Jacket supports any CUDA-capable GPU from NVIDIA. If you have the newest Fermi GPUs, Jacket will automatically leverage the best hardware features available on that architecture. If you have an older GPU, like a GeForce 8600, Jacket will still run, but you won’t get the best performance and you may not get double-precision computation.

We also have projects under way to support the emerging AMD and Intel many-core and hybrid chipsets.

Wong: Why is running MATLAB code on a GPU a good idea?

Melonakos: It is commonly known that MATLAB applications run slowly. FOR-loops are especially slow. Yet MATLAB continues to increase in popularity due to its ease-of-use and tremendous value in rapid prototyping. Plugging GPUs into MATLAB makes slow running code fast, can be done at the low cost of a GPU, andpreserves the rapid prototyping qualities that make MATLAB so convenient through Jacket’s transparent approach. The MATLAB application is based on a vector language, i.e. operations are performed on vectors and matrices and hence already data-parallel. This naturally maps onto the GPUs data-parallel hardware.

Wong: How different is MATLAB GPU code from regular MATLAB CPU code?

Melonakos: The emphasis in Jacket is to require minimal code change. We do this through Jacket’s GPU data types (e.g. GDOUBLE, GSINGLE, GLOGICAL, etc), which are GPU-equivalents of CPU data types. All you have to do is cast your matrices to Jacket’s GPU data types. Then all other operations are overloaded using MATLAB’s standard object oriented interface for overloading functions. So if you pass a GPU data type to the CONV2 function, Jacket will run that function on the GPU. If you want to bring the data back to the CPU, simply cast back to the CPU with CPU casting commands (e.g. DOUBLE, SINGLE, LOGICAL, etc).

For example,
>> A = gdouble( A ); % cast to GPU device
>> B = my_function( A ); % no need to modify my_function.m
>> C = fft( B ); % B & C remain on GPU
>> D = double( C ); % cast back to host CPU

Wong: How big is the Jacket community?

Melonakos: There are thousands of Jacket programmers all across the world. The price point is low enough that it is picked up readily by developers hungry to get more performance and to get ahead of the curve with respect to GPU computing. Also, we have a $10,000 campus-wide license program for universities that is a hot seller for us. It’s a great deal for the university to get researchers jump-started with GPU computing, and it’s a great investment for us to have students and faculty widely using Jacket.

Wong: What are people doing with Jacket today?

Melonakos: MATLAB is so pervasive, with user communities in healthcare, defense, energy, automotive, finance, and many other industries. Each of those industries has a segment with a need for speed. There are Jacket programmers finding success in all of those industries. On our website, you will find dozens of case studies and blog posts relaying many projects from radar clutter removal at System Planning Corporation to viral hepatitis C simulations at the Center for Disease Control.

Wong: How does Jacket stack up against the Parallel Computing Toolbox support for GPUs?

Melonakos: The fundamental design differences mentioned earlier are important. The Jacket runtime system is a big deal for performance on real applications. It is easy to write a CUDA kernel to do a fast matrix multiply, as has been done in the Parallel Computing Toolbox. Jacket goes beyond that to solve the problem of making a series of MATLAB functions fast on the GPU, so that real applications run fast. From the start, we have been driven by the challenge to accelerate real MATLAB applications.

Jacket supports 1,000s of MATLAB function syntaxes. There is no other GPU library (the Parallel Computing Toolbox included) that comes close to the breadth of functionality available in Jacket. Jacket’s functions, coupled with more advanced features like GFOR for parallel GPU FOR loops, make Jacket useful in most MATLAB applications.

How does someone with slow MATLAB code start using Jacket? Free, fully functional 15-day trials are available on our website. In addition to that, our application engineers provide WebEx demos and one-on-one support for new Jacket programmers. Also, a very active Jacket forum community has evolved online and questions are answered within a few hours of posting.

Wong: How do Jacket programmers scale from 1 GPU to multiple GPUs?

Melonakos: There are several addon features to Jacket which enable it to run across multiple GPUs in a single computer (Jacket MGL) or to run in a cluster, server, or cloud environment (Jacket HPC). These products work without any additional code change. Jacket automatically ensures that all the GPUs in the system are efficiently utilized to get the job done.

Wong: Where do you go from here? How does Jacket evolve?

Melonakos: Jacket will evolve in two ways. First, as we deliver support for GPUs from other vendors, Jacket programmers will automatically get to run their code on the new devices, without any further code rewrites. Our promise is: “Write your code once and let Jacket carry you through the coming hardware evolution.”

Second, we are in the process of making Jacket’s fast runtime and broad set of GPU functions available to programmers outside of MATLAB, though libJacket. In 2011, you will find many other applications will become Jacket-enabled so that more scientists, engineers, and analysts can enjoy the same benefits the MATLAB community has enjoyed for 3 years.