FPGAs and memory have been paired for ages. BittWare’s 250-M2D FPGA Accelerator M.2 module (Fig. 1) follows that pattern by squeezing a Xilinx Kintex UltraScale+ FPGA with up to 32 GB of DDR4 memory. The I2C M.2 interface is used to program the serial flash memory that’s used to program the Kintex KU3P FPGA. The 16-nm FPGA has up to 356K logic cells.
The PCI Express (PCIe) interface supports NVMe, which is de facto high-speed storage interface. The module supports computational storage processor (CSP) configurations—CSPs are becoming more popular as computational services move from the host CPU to peripherals. This offloading provides more efficient operation in addition to freeing up the CPU for other chores. CSP tasks include services such as compression/decompression, storage deduplication, encryption/decryption, artificial intelligence/machine learning, and data analytics.
The FPGA can be programmed with custom applications or using off-the-shelf firmware. For example, the platform supports Myrtle.ai’s artificial-intelligence inference engines that target applications like speech synthesis and machine translation.
Eideticom’s NoLoad Computational Storage Processor is also available for the 250-M2D. The NoLoad CSP runs on BittWare’s NVMe U.2, EDSFF and PCIe AIC platforms as well.
BittWare’s 250-M2D Open Compute M.2 accelerator fits into a conventional M.2 slot with a Gen 3 x4 PCIe (Fig. 2). A large heat sink is required to keep the FPGA cool, so the module will not fit in all M.2 platforms. It does conform to the Open Compute M.2 socket form factor. The module has a TDP of 14.85 W with a peak absolute power requirement of 24 W.
Bringing FPGA CSP support to this small form factor allows this functionality to be incorporated into edge devices from drones to Industry 4.0 applications. The platform can be used wherever an FPGA with a sizable chunk of memory will be useful. FPGAs have the advantage of being able to simultaneously support all aspects of an application, from massaging input and output data as well as implementing any combination of computation and analysis in a single fabric.