This article is part of the Communication and System Design Series: Have SmartNIC - Will Compute
Every server connects to a network using a network interface card (NIC). Sometimes these are embedded wireless connections that typically support Internet of Things (IoT) devices like cameras and thermostats, but the vast majority of servers are wired to the network. They use wire for many reasons, but the two most prominent are performance and availability.
When talking about availability, a wired network only fails when the cable is damaged or removed; by contrast, wireless networks are subject to congestion and external interference. For network performance, we often talk about two metrics: bandwidth, the volume of data you can move through the network, and latency, the time you spend waiting to move one piece of data. The bandwidth and latency of wired networks are easily one or more orders of magnitude better than that of wireless.
For example, wired data-center networks today are typically 25 Gb/s, while wireless is 1/20 that speed at 1.3 Gb/s (using the 5-GHz standard). These same wired data-center networks measure latencies in the 2- to 5-µs range, while wireless networks are often 1/1000 that speed at 1 to 2 ms.
Having dedicated wired NICs for servers in the data center is critical to the server’s overall performance, but can we do better? Yes, we can make NICs smarter by adding computational resources to process network traffic both as it enters and exits the server or even offload the host CPU at the application level.
Computational Elements
SmartNICs are the fusion of wired networking and computational resource on the same card. These computational resources can be composed of one or more of the following categories: classical x86 CPUs like Arm cores, purpose-built cores for digital signal processors (DSPs), artificial intelligence (AI), networking processing units (NPUs) or field-programmable gate arrays (FPGAs).
Which of these computational models used to create a SmartNIC is determined by the target market for that SmartNIC? Is the SmartNIC focused solely on reducing the impact of network or storage traffic, or was it designed to offload the host CPU?
It’s not uncommon for more than one of the above computational elements to be included on a SmartNIC. For example, one dual- or quad-core Arm complex is often used for control-plane management tasks like loading software into other computational units and logging. So how can SmartNICs offload the host CPU from networking, storage, or even application-specific computational tasks?
DDoS Defense, Firewalling, Packet Wrapping
First, we have apparent networking tasks like distributed denial of service (DDoS) mitigation, firewalling, and packet wrapping. While DDoS and firewalling appear similar, they are two very distinctly different tasks. By design, DDoS attacks are primarily volumetric, meaning that their whole point is to deliver millions; we’ve even seen hundreds of millions of packets per second to a network.
When a network is overcome by congestion, components fail, and genuine customer traffic is delayed or even dropped. Conventional firewalls aren’t designed to handle these packet rates, but several basic tricks can be employed in a SmartNIC to count and then drop DDoS attack packets dynamically. Back in 2015, RioRey produced a Taxonomy of DDoS Attacks that calls out the 25 types of attack vectors. The most common of these could be loaded into a SmartNIC, creating a DDoS defense for a company’s edge internet servers.
A SmartNIC could also include a basic Netfilter firewall offloading the host CPU from filtering all inbound and outbound packets. Netfilter is the new version of iptables, and it provides a very robust architecture for filtering network traffic. All edge-connected servers, even those residing in a network DMZ, should be running a firewall. Offloading this firewall to a SmartNIC could save the host CPU millions of instructions per second that could then be applied to the applications running on that server.
We also have packet wrapping, known as encapsulation. Whenever we utilize overlay networks for virtualized or containerized systems, we need to wrap the network packets so that they can be routed between these overlay networks. An example of overlay network processing is Open vSwitch (OvS), which can be very CPU intensive, so offloading this task to a SmartNIC frees up significant host CPU cycles.
Finally, we could also offload primary network applications that might typically run on the server like DNS or in-memory databases. Processing DNS queries entirely within the SmartNIC is a typical SmartNIC application as the transactions are small, and the table lookups are quickly processed.
Another excellent example of a SmartNIC application is an in-memory database. Many customer applications these days rely on unstructured data stored using in-memory database applications. These data elements often leverage simple keys for which the values are also often just as small. For instance, what is the network address of a specific host if the name is X. The network address, even an IPv6 address, is only 16 bytes, and on Linux, a host name is limited to 63 bytes—both easily fit in single small network packets.
Storage Control with a SmartNIC
A SmartNIC can also double as a storage controller. Some SmartNICs, like Xilinx’s Alveo U25 (see figure), have both on-chip and on-board memory in the gigabytes (6 GB in the case of the U25) of their own local storage. This storage can easily double as a cache to the server’s own NVMe disks. This will become especially important soon, as protocols like Compute Express Link (CXL) enable future SmartNICs to manage the master-slave relationship with NVMe drives directly.
SmartNICs could also do erasure coding in the hardware as well as storage encryption. For drive encryption, SmartNICs offer a unique security angle. From a security perspective, it’s never a good practice to keep the encryption keys near or even worse, on the item that’s encrypted. If a SmartNIC encrypts or decrypts data going to NVMe storage, then both elements are required if someone wishes to break the encryption. If an admin removes the drives to decrypt them elsewhere, they would then need brute force to guess the missing encryption keys that were left behind on the SmartNIC.
SmartNICs can easily employ cryptography to secure their keys between power cycles, further making the system both robust and secure. Solarflare, for example, has maintained a hardware security enclave on the NIC to store the NIC’s keys within its X2 silicon for the past several years. Future SmartNIC security enclaves could potentially save and secure hundreds of thousands of security keys for SSL/TLS end-point encryption.
Offloading
CPU offload is also a significant value proposition for a SmartNIC. It’s possible, using code available today, to offload computationally intensive tasks into a SmartNIC. These could be tasks like hashing for blockchains and transcoding video.
Blockchains rely on solving a proof of work, or similar type of problem. The first node on a network that reaches a solution is provided a reward, and then permitted to bundle up and publish the next block on the chain. SmartNICs could hold the blockchain and pending transactions in memory while computing the next solution. If they win, then the SmartNIC publishes the block and moves on to the next block.
We’re not advocating for SmartNICs to become mining rigs on the internet at large, but rather the opposite. A savvy architect could use a collection of SmartNICs within their infrastructure to maintain the company’s own transaction ledgers by using a locally hosted blockchain. All of this could be accomplished by utilizing the computational power of the SmartNICs without ever impacting the host CPU performance. Companies like Silex Insight have developed the required blockchain components to make this possible.
Video transcoding is another popular host CPU offload that lends itself well to SmartNICs. Transcoding video, especially live video, using adaptive-bitrate (ABR) compression to support mobile devices, is another CPU-intensive task. These compression tasks are extremely linear and have been ported to FPGA-based accelerators where they’ve proven to be 10X to 20X more efficient than general-purpose CPUs.
Electronic Trading
One final special case where SmartNICs shine is ultra-low-latency electronic trading. Here we’re talking about moving network packets in tens of billionths of a second. Today, latency on high-performance 25-GbE NICs is in the range of 1,000 ns. With a properly architected system, the right software, and a tuned SmartNIC, network packets can be analyzed as they’re being received, four bytes at a time. The response packet can then be injected into the network in a blindingly fast 22 ns. This is over 40X faster than traditional high-performance NICs. When deployed in electronic trading, the return on investment (ROI) for these SmartNICs can sometimes be measured in fractions of a second.
Today’s servers often spend 30% of their CPU cycles managing networking; this is jokingly referred to as the data-center tax. Imagine if you could get those cycles back in every server within your data center. That would be like having one new server for nearly every three in production.
SmartNICs enable system architects to place high-performance computing resources at the very edge of the server—the network. SmartNICs can then be leveraged to protect the server, and therefore the enterprise, while also dramatically offloading the much more expensive server CPUs. So, when you’re designing your next data-center deployment, instead of defaulting to the standard NIC coming with your server, perhaps consider how SmartNICs might fit into your plans.
Scott Schweitzer is Technology Evangelist at Xilinx.
Read mores articles from the Communication and System Design Series: Have SmartNIC - Will Compute