SMART Modular’s CXL-Attached Storage Cards Deliver 4 TB of Shared Memory

TechXchange: CXL for Memory and More

What you’ll learn:

What is CXL-attached memory?
What is the difference between CXL and NVMe storage?
What Smart Modular Technologies is bringing to the table.

Using CXL-attached memory is becoming more commonplace in high-performance servers. It takes advantage of the CXL standard that’s moving into its third incarnation. CXL, built on PCI Express (PCIe) Gen 5, provides a cache-coherent environment that allows for disaggregation of memory. It enables applications running multiple processing elements to access any memory connected to a CXL fabric (Fig. 1).

1. A cache-coherent environment enables applications running multiple processing elements to access any memory connected to a CXL fabric.

Typically, processor-attached memory can be shared among multiple processors. However, in the past, this interface was proprietary. Dual inline memory modules (DIMMs) attached to one processor could be accessed by all, but there was a limitation on the number of DIMMs. On top of that, the approach doesn’t scale to hundreds or thousands of processing elements found in cloud servers. CXL offers this type of scaling while maintaining the cache-coherent support like that of proprietary systems.

Nonvolatile Memory Express (NVMe) is also based on PCIe, but it is block oriented and generally used with flash memory. It doesn’t have to support cache coherency that simplifies the controller. NVMe-over-CXL (NVMe-oC) is an emerging option that takes advantage of CXL while retaining the NVMe interface already supported by operating systems and applications. NVMe-over-Fabric is already in play to address the hyperscaler’s high-performance-computing (HPC) requirements. CXL-attached memory is just another piece to the hyperscaler HPC puzzle.

Delivering on the CXL-Attached Memory Promise

SMART Modular Technologies has pushed the boundaries of memory technology since its inception, so it’s no surprise that the company’s latest products address CXL-attached storage. The eight-DIMM CXA-8F2W (Fig. 2) and four-DIMM CXA-4F1W add-in cards (AICs) include a CXL controller and a bunch of SMART Modular DIMMs.

2. SMART Modular Technologies' CXA-8F2W hosts 4 TB of DDR5-4800 storage from eight DIMMs.

“The CXL protocol is an important step toward achieving industry-standard memory disaggregation and sharing, which will significantly improve the way memory is deployed in the coming years,” said Andy Mills, senior director of advanced product development at SMART Modular.

The cards have a full-height, half-length, x16 PCIe form factor. They use standard DDR5 registered DIMMs (RDIMMs) that provide up to 4 TB of storage with the fully populated CXA-8F2W. It uses two CXL memory controllers, delivering a total bandwidth of 64 GB/s with a 200-ns latency. That configuration dissipates 135 W of power. Users can select the RDIMMs with a corresponding reduction in capacity and power requirements. The top end uses 512-GB modules, while a 90-W system would employ 64-GB modules for a 512-GB capacity.

What may be interesting to some is that the x16 PCIe card exposes the CXL controllers as a pair of x8 PCIe connections. This is supported by the PCIe standard as well as switches that negotiate the type of connection and speeds involved. It offers a more efficient interface overall.

Why CXL-Attached Memory is Important

Any programmer knows that there’s never enough available memory. This is especially true for the massive cloud servers that provide HPC services. Artificial intelligence and machine learning (AI/ML) demands in this space include the need for very large amounts of memory, which is available with CXL-attached memory.

Using cache-coherent CXL-attached memory is much more efficient for most applications that must share data, versus NVMe or application-based communication over network connections like Ethernet.

The applicability for embedded solutions tends to be more limited because the number of cores and memory is usually much smaller than cloud servers, where terabytes of memory is equivalent to the proverbial “drop in the bucket.” Still, the flexibility that comes with CXL-attached memory is something embedded HPC application developers should not ignore, especially when dealing with AI/ML applications on the edge.