This article is part of our Library Series: System Design: Memory for AI
What you’ll learn:
- The benefits of on-chip memory.
- Dealing with the capacity issues of on-chip memory.
- HBM vs. GDDR: Determining the best option.
In part 3 of this series, we explored how the Roofline model can help determine whether certain AI architectures are limited by their compute performance or memory bandwidth. Utilizing this data, designers can make informed decisions on which type of memory system would be best suited to a particular application.
A variety of common memory systems are being used in high-performance AI applications, each with its own unique set of benefits and challenges. More than anything, choosing the “right” solution depends on the application and your use case.
On-Chip Memory: All Business
On-chip memory is the highest-bandwidth, most energy-efficient solution available. It can provide tens of terabytes per second of memory bandwidth, and modern reticle-sized processors can reach several hundred megabytes of capacity. In addition, the short distance that data needs to travel between on-chip memory and the compute units dramatically lowers access latency and further increases power efficiency.
The low latency and high bandwidth nature of on-chip memory allow for extremely high utilization of compute engines, making them well-suited to high-performance, low-power applications, especially when processing in handheld and battery-operated devices.
While the performance and power efficiency of on-chip memory is unparalleled, the primary drawback revolves around limited capacity. On-chip memory’s storage capacity is much lower than external DRAM solutions, which today can reach into the tens of gigabytes when using multiple DRAMs.
A number of interesting innovations have emerged that make better use of the limited capacity of on-chip memory, including reduced precision data types and recalculating intermediate results to avoid occupying on-chip storage. However, the tremendous growth in training sets and model sizes continues to outpace these innovations, resulting in on-chip memory being better equipped for AI inference tasks than for AI training tasks.
Because of these tradeoffs, on-chip memory is a great solution when running inference tasks on smaller neural networks that fit within the capacity of the memory, or when inferencing in environments where multiple chips can work together to provide a solution. If this isn’t the case, it’s best to pursue other external memory options, such as high bandwidth memory (HBM) and graphics double data rate (GDDR).
HBM: Complex Power
HBM, the newest high-volume DRAM solution, has seen rapid adoption in AI solutions. HBM uses stacking within the device to achieve high capacities, as well as an extremely wide interface (1024 data wires) running at a relatively low data rate (two gigabits per second in HBM2) to achieve extremely high bandwidth with good signal integrity. The unique combination of stacking along with a wide and slow interface enables HBM memory to achieve extremely high performance while maintaining good power efficiency. With increased capacity over on-chip memory, HBM offers the best combination of bandwidth and power efficiency possible for an external memory solution.
The area and power advantages that result from the HBM architecture come at an additional design and manufacturing cost. The numerous I/Os require a fine pitch that necessitates the use of an additional silicon interposer, substrate, and intricate stacking within the DRAM and between components in the system, adding extra cost and complexity before being assembled onto the PCB. Keeping the silicon cool and addressing the system engineering challenges associated with stacking add further challenges to implementing HBM2 solutions.
However, for organizations with the engineering skill to implement HBM memory systems, and with the ability to amortize the added costs, HBM2 can be a great choice for systems that need an external memory solution.
GDDR6: The All-Rounder
Created for the graphics industry 20 years ago, GDDR offers a good middle ground between the bandwidth, power efficiency, cost, and reliability offered by on-chip memory and HBM DRAMs. GDDR leverages more familiar high-volume manufacturing and assembly techniques used in traditional DRAMs like DDR, making it a good solution for balancing performance and complexity.
In contrast to HBM DRAMs, which implement a large number of data wires running at modest data rates, GDDR6 DRAMs take the opposite approach and have 32 data wires running at 16 Gb/s—eight times the speed of HBM2 DRAMs. The smaller number of data wires eliminates the need for additional components like interposers. However, running at much higher data rates presents signal integrity and power-efficiency challenges.
Those issues can be managed with carefully designed PHYs, packages, and boards. Furthermore, GDDR DRAM devices don’t utilize stacking, further simplifying the manufacturing process and reducing cost. As a result, GDDR offers a cost-effective solution for achieving good performance, power-efficiency, and cost.
SoC Considerations for Choosing Between HBM2 and GDDR6
When designing a processor to utilize GDDR or HBM, one must consider some important tradeoffs. In addition to the aforementioned differences between the DRAMs themselves, there are other disparities in how processors connect to these DRAMs.
Among the most important differences are those related to the PHY circuits on the SoC, which connect it to the DRAMs. For equivalent GDDR6 and HBM2 memory systems that deliver 256 GB/s of memory bandwidth, GDDR6 PHYs require between 1.5 and 1.75 times the area on the SoC compared to HBM2 PHY circuits delivering the same performance.
In terms of power, the differences are even more pronounced: GDDR6 PHYs consume anywhere between three-and-a-half to four-and-a-half times as much power as the HBM2 PHY at the same bandwidth. From the point of view of an SoC designer, this large disparity in power and area favor HBM2 memory systems. However, the added cost and implementation complexity of HBM2 memory systems can make the choice of GDDR6 a more attractive one.
Whether or not you choose HBM2 or GDDR6 ultimately depends on what matters most in the system at hand. If you’re prepared to handle the cost and engineering complexity of an HBM2 implementation, it’s the best route to take. But for systems that prioritize cost and more mainstream manufacturing methods, GDDR6 is an excellent solution. There’s no wrong answer when it comes to picking a high-bandwidth-memory solution for your application.
Both on-chip and external memory solutions offer the high bandwidth and low latency to meet the needs of today’s most intensive applications. Choose wisely, and your efforts will be rewarded.
Read more articles from the Library Series: System Design: Memory for AI