Khronos Group's OpenCL has become very popular with GPU platforms and it translates well to CPUs as well. About a year ago Altera talked about their work in bringing OpenCL applications to FPGAs (see How To Put OpenCL Into An FPGA). That technology is now available in Altera's SDK for OpenCL. It can be used with Altera's FPGAs such as the hign end Stratix V with over 1 million logic elements and over 50 Mbits of integrated memory. The OpenCL applications can also take advantage of the variable precision floating point DSP blocks.

Essentially the SDK converts kernel code in OpenCL applications into FPGA configurations that work with additional FPGA support to link the system to a host processor that provides data and initiates the OpenCL kernels (Fig. 1). The support on the FPGA works with hard PCI Express interfaces connected to the host processor, the FPGA's memory controller and the OpenCL kernels on the FPGA. It can accept commands and data via PCI Express. It also moves data to and from the off-chip memory.

74777_fig1

Figure 1. The first instance of Altera's SDK for OpenCL utilizes a PCI Express bridge to exchange data between OpenCL kernel code implemented on the FPGA and a host processor.

Tyipcally the host moves data into the FPGA's off-chip memory and then has a kernel operate on the data. The kernel processes the data and results are stored in the off-chip memory. The results can then be read by the host or utilized in subsequent operations. The FPGA can have one or more kernels implemented on-chip. The SDK also includes support for the host API.

The OpenCL FPGA code is generated by the SDK's OpenCL compiler and then downloaded to the FPGA. The advantage is the FPGA's ability to run the kernel code functions in parallel in addition to being able to replicate the support. Multicore CPUs and GPUs take the latter approach with multiple cores but only an FPGA has the ability to do more in parallel than these platforms. Of course, the speed up will vary depending upon the algorithm.

The SDK currently works with Spartan V boards from BittWare and Nallatech. Altera is working with other third parties to get their boards to work with the SDK. PCI Express and memory support is a requirement at this point but the technology will eventually be applied to Altera's dual core Cortex-A9 FPGAs.

Utilizing OpenCL with on-chip hard cores offer performance advantages over the PCI Express link. The interface between the FPGA fabric and the on-chip cores is 125 Gbits/s, much faster than PCI Express. The Cortex-A9 may not be as powerful as some other hosts but having a faster pipe will still be a significant advantage.

Other possible future features include OpenCL streaming support and dynamic applications. The typical OpenCL system on CPUs and GPUs moves data into a work area, processes a block of data providing some results. A stream approach would provide smaller chunks like a packet or byte stream with results being provided in a similar fashion. FPGAs are actually more amenable to this methodology.

The SDK generates data for the FPGA boot memory at this point. Altera dues support dynamic reconfiguration. Combine the two ideas and it is possible to change OpenCL kernel support on the fly.

For now these possible features are a work in progress. They will eventually make their way into the SDK as they are refined. In the meantime there are a lot of applications that can take advantage of the SDK as is.

Almost any OpenCL application is a candidate for implementation in an FPGA. These range from imaging applications to Monte-Carlo Black Scholes simulations for financial organizations. Partitioning onto multiple FPGAs could be an issue for very large applications.

The SDK also opens up FPGAs to a lot of new developers because this approach requires C expertise but not FPGA expertise. For now, there is little difference from a macro level view between using a x16 PCI Express GPU with OpenCL and a x16 PCI Express board with and FPGA except maybe price and performance.