Supermicro
671feab21879626fd58af6bf Promo Supermicro

Liquid Cooling Dials Down AI Supercomputer Heat

Oct. 30, 2024
Not only do massive clusters of GPUs require major amounts of power, but also a cooling system that can handle it all. That's where the xAI Colossus steps in.

Two things stand out with Supermicro computers that were used to build the xAI Colossus for X: fiber optics and liquid cooling. Fiber optics has been around a long time and is already used regularly within data centers when moving data outside of the rack or if there's the need for very high bandwidths. 

Liquid cooling isn't new. What is novel, though, is using it in large-scale systems, which is where xAI stands out. The system incorporates thousands of very powerful GPUs to handle massive machine-learning (ML) workloads.

Each liquid-cooled rack has eight 4U servers. Each server contains eight NVIDIA H100 boards. The H100 was introduced in 2022 and incorporates HBM3 memory. The xAI rack cools the system using the Supermicro Coolant Distribution Unit (CDU). 

Different Types of Liquid Cooling

The CDU uses direct-to-chip (D2C) liquid cooling that keeps the liquid contained within a system of pipes and tubes. The heat exchange at the source is via a cold plate. Immersion cooling keeps the electronics within a bath of cooling fluid, but this requires either specially designed boards or the use of fluid that will not react with the electronics (see figure).

Liquid Cooling and Superclusters

The xAI Colossus supercluster has racks arranged in groups of eight, which translates to 512 GPGPUs. All of the GPGPUs are connected into one logical system using NVIDIA's NVLink interface on each GPGPU plus a set of NVSwitches like those introduced in the NVIDIA DGX2. 

As noted, it's the scale of the implementation that makes the xAI Colossus so impressive. It runs the fiber-optic cabling above and the cooling system below. The individual liquid tubing is connected via a heat exchangeer to a massive pipe system. This is used to move the liquid outside, where the heat for the system can be dissapated. The video below provides a tour of the system. 

Inside the World's Largest AI Supercluster xAI Colossus

The individual systems within the supercluster are designed for hot swapping. This allows for upgrades as well as replacement of broken systems. With the quick connect tubing, a system can be removed from the cooling infrastructure while the electrical contacts make it possible to extract systems from the compute environment. 

Like power supplies, the system employs redundant CDUs, including the integrated pumps. This allows for cooling components to be swapped out in a modular fashion just like with the systems and power supplies. 

Another impressive item of note was how quickly the system was built: It took 122 days to put together 100,000 NVIDIA H100 GPGPUs. That may seem like a lot of time, but most supercomputer installations have taken much longer to put together, including debugging challenges on such a complex system. 

Read More About Liquid Cooling

Simscale Sprom Onew
Automation

Optimize Liquid-Cooling Systems Through CHT Analysis

Liquid cooling is a standard technology in electronics design. Discover its main benefits and how cloud-based engineering simulation can help analyze its effects.
The researchers placed a silicon layer over microfluidic passages etched into the FPGA and ports were attached for water tubes which pumped deionized water over the chip Image courtesy of Rob Felt Georgia Tech
Microprocessors

Researchers Move Liquid Cooling Up Against FPGA Chips

A research team from the Georgia Institute of Technology has found a method to move liquid cooling directly onto the chip, carving trenches into the backside of FPGA devices.
Image
Interconnects

Seven Attributes Of Effective Quick Disconnect Couplings For Liquid Cooling Of Computers And Servers

As computer and server manufacturers turn to liquid-cooling systems to more efficiently remove excess heat, quick disconnect couplings are playing an important role in system ...
Dreamstime_Andrey-Zhuravlev_206370848
Dreamstime Andrey Zhuravlev 206370848
Power

Power Density in Liquid-Cooled Power Converters

Liquid cooling of power supplies offers a better option than air cooling, especially for supplies with a miniaturized design. However, designers must take proper precautions when...
Roman Snytsar | Dreamstime
Promo Roman Snytsar Dreamstime Xxl 7193023
Thermal Management

TechXchange: Cool Designs

There are many ways to keep a design cool, including heat pipes and vapor chambers
About the Author

William G. Wong | Senior Content Director - Electronic Design and Microwaves & RF

I am Editor of Electronic Design focusing on embedded, software, and systems. As Senior Content Director, I also manage Microwaves & RF and I work with a great team of editors to provide engineers, programmers, developers and technical managers with interesting and useful articles and videos on a regular basis. Check out our free newsletters to see the latest content.

You can send press releases for new products for possible coverage on the website. I am also interested in receiving contributed articles for publishing on our website. Use our template and send to me along with a signed release form. 

Check out my blog, AltEmbedded on Electronic Design, as well as his latest articles on this site that are listed below. 

You can visit my social media via these links:

I earned a Bachelor of Electrical Engineering at the Georgia Institute of Technology and a Masters in Computer Science from Rutgers University. I still do a bit of programming using everything from C and C++ to Rust and Ada/SPARK. I do a bit of PHP programming for Drupal websites. I have posted a few Drupal modules.  

I still get a hand on software and electronic hardware. Some of this can be found on our Kit Close-Up video series. You can also see me on many of our TechXchange Talk videos. I am interested in a range of projects from robotics to artificial intelligence. 

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!