Grosshansdorf, Germany. Currently, digitizers have a bottleneck caused by having to use either the host PC’s central processor with eight or sixteen cores or a FPGA that is complex to program. Spectrum Instrumentation has addressed this problem with its new SCAPP software option—the Spectrum CUDA Access for Parallel Processing—that opens an easy-to-use yet powerful way to digitize, process, and analyze electronic signals.
SCAPP allows a CUDA-based GPU to be used directly between any Spectrum digitizer and the PC. The big advantage is that data is passed directly from the digitizer to the GPU where high-speed parallel processing is possible using the GPU board’s multiple (up to 5,000) processing cores. That provides a significant performance enhancement when compared to sending data directly to a PC that may have only eight or sixteen cores. It becomes even more important when signals are being digitized at high-speeds such as 50 MS/s, 500 MS/s, or even 5 GS/s.
The Spectrum approach uses a standard off-the-shelf GPU, based on Nvidia’s CUDA standard. The GPU connects directly with the Spectrum digitizer card, with no more CPU interaction, opening the huge parallel core architecture of the CUDA card for signal processing. The structure of a CUDA graphics card fits very well as it is designed for parallel data processing, which is exactly the same as most signal processing jobs. For example, the processing tasks of data conversion, filtering, averaging, baseline suppression, FFT window functions, or even FFTs themselves can all be easily parallelized.
Until today, there have basically been two different approaches for processing data for high-speed digitizers. The first and most common method simply uses the CPU for calculations. This approach offers a straightforward way to create processing programs using a variety of different programming languages and nearly no extra cost. However, the performance is often limited by the CPU’s resources as it must share its processing power with the rest of the PC system, the operating system, and the GUI components.
The second approach is to use FPGA technology, either with fixed processing packages from the vendor (like the Block Average package from Spectrum) or using an open FPGA with a firmware development kit (FDK). It’s a powerful solution but it comes with a much higher cost and complexity. Large FPGAs are expensive and to use them requires an FDK from the digitizer vendor along with other implementation tools from the FPGA vendor. Also, the level of knowledge to implement signal processing into an FPGA using VHDL isn’t a skill everybody has. This soon results in very long development cycles. Even worse, it is easy to run into the limits of the FPGA that is soldered onto the card. For example, if the block RAM is at the limit, there is nothing to improve anymore.
The SCAPP approach TCO, compared with any FPGA-based solution, is low: a matching CUDA graphics card ranges from around €150 to €3,000 and the necessary SDKs are free. However, the largest cost saver is the development time. Instead of spending weeks to just understand the FDK, the structure of the FPGA firmware, the FPGA design suite, and the simulation tools, the user can start immediately with some easy-to-understand C code and common design tools.
The SCAPP driver package consists of the driver extension for remote direct memory access (RDMA) that allows the direct data transfer from the digitizer to GPU. It includes a set of examples for interaction with the digitizer and the CUDA card and another set of CUDA parallel processing examples with easy building blocks for basic functions like filtering, averaging, data de-multiplexing, data conversion, or FFT. All the software is based on C/C++ and can easily be implemented and improved with normal programming skills. Starting with tested and optimized parallel processing examples gives first results within minutes.
The interconnection between digitizer and GPU is based on PCI Express. Depending on the selected Spectrum digitizer card, a continuous throughput of more than 3.0 GB/s between the digitizer and GPU can be achieved. That is enough to support continuous acquisition from a 1-channel 8-bit digitizer sampling at 2.5 GS/s or a 2-channel 14-bit unit running at 500 MS/s. By using one of Spectrum’s transfer-bandwidth saving data acquisition modes, like Multiple Recording, the sampling speeds can be even much higher.
CUDA cards are scalable with processing cores between 256 and 5,000 (in comparison a dual Quad-Core Xeon CPU with Hyperthreading will only have 16 cores), with memory of several GB and up to 12.0 TFLOP. A small sized card with 1k cores and 3.0 TFLOP is already capable of doing continuous data conversion, multiplexing, windowing, FFT, and averaging at two channels at 500 MS/s with a FFT block size of 512k—and that can run for hours. In contrast, an FFT package from other digitizer vendors will typically limit the FFT block size to a maximum of 4k or 8k, as this is the limitation of the FPGA.
The SCAPP package is a driver extension for all Spectrum cards. It can be used with the ultrafast digitizers of the M4i platform (250 MS/s 16 bit, 500 MS/s 14 bit or 5 GS/s 8 bit) as well as the latest medium performance M2p platform (20 to 80 MS/s multichannel 16 bit). The basic RDMA functionality is available under a Linux operating system.