Xilinx’s Alveo U280 pushed the high end of FPGA enterprise computing, delivering 24.5 INT8 TOPS with a million LUT UltraScale+ FPGA. It takes up two slots and gobbles up just under 225 W.
The U280 shares a number of features with the new Alveo U50 (Fig. 1). Both have HBM2 on-chip memory with a 460-GB/s bandwidth. They also support PCI Express (PCIe) Gen 3 and Gen 4 interfaces along with CCIX, a new hardware interconnect standard based on PCIe.
1. Xilinx’s single-slot Alveo U50 brings the latest features like CCIX support and HBM2 memory while using less than 75 W.
The Alveo U50 has an UltraScale+ FPGA, too, but with only 872K LUTs. It also has a single 100-Gb/s QSFP 28 interface. However, these features allow the system to be packaged in a half-height board that uses less than 75 W. The Alveo U50 targets the edge, where smaller board form factors and lower power are advantages.
Going smaller doesn’t mean giving up functionality or flexibility. The Alveo family shares a common deployment stack (Fig. 2). This includes applications that run in Docker containers managed by Kubernetes. Thus, applications can specify what FPGA capabilities are needed in addition to more conventional resources like processors, memory, and storage. The system can handle installation of the FPGA configuration prior to transfer data to and from the FPGA, or allow the FPGA to manage the network interfaces. With this approach, an accelerated service can easily run on a U50 or scale up to multiple U280 boards.
2. A Xilinx device driver allows Kubernetes to manage a system and provide Docker containers with FPGA services in a transparent fashion.
The Alveo U50 can tackle a wide range of compute and machine-learning (ML) applications. For example, it can handle speech translation between than a CPU or GPU while using less power. It’s also able to accelerate financial marking models and database analytics. The U50 can even accelerate storage management (Fig. 3) while using ML or other techniques to process the data as it flows between a network host and local storage devices. Xilinx demonstrated this capability at this week's Flash Memory Summit.
3. Using an Alveo U50 as a custom interface between storage and a network is just one way to take advantage of FPGA configurability.
The demonstration system provided an NVMe over Fabric (NVMeoF) system supporting 2.5 million IOPS. It only added one microsecond of latency to the JBOD SSD array while providing compression, encryption, and database scanning support.
Many edge applications will require only a single Alveo U50. However, its compact size will also allow multiple boards to be used where its larger siblings would not fit.