Multiphase Controller Fine-Tunes Power for AI Chips in Data Centers
Alpha and Omega Semiconductor (AOS) introduced a 16-phase pulse-width-modulation (PWM) controller to convey power smoothly and efficiently to the big power-consuming AI chips residing in data centers, including NVIDIA’s latest-generation GPUs.
The multiphase controller is based on the company’s AOS Advanced Transient Modulator (A2TM) control technology, which gives it the ability to stay on top of the rapidly changing load conditions in AI chips. It comes with a programmable switching frequency of 200 kHz to 1 MHz. In addition, a peak current-mode control scheme adjusts the frequency depending on the current traveling into the processor, and current sensing balances out the phases of the power supply—both of which promote power savings.
The Silicon Valley company said the AOZ73016QI can deliver accurate current balancing at all load conditions and during transient conditions, which is a critical requirement for power-hungry AI chips that can consume thousands of amps at peak current.
The dual-rail digital controller is designed to drive the power stages in front of the load so that they can deliver the very tight voltages used by chips on the cutting edge of Moore’s Law. The digital PWMVID architecture with differential remote sensing can regulate output voltages with 0.5% accuracy, said AOS.
The power IC is specifically designed to NVIDIA’s latest Open Voltage Regulator (OVR) Vreg16 Phase specifications. It can be paired with DrMoS or other industry-standard power stages to create a flexible and efficient power-delivery solution.
This all amounts to “a robust, reliable power solution for more advanced AI servers,” said Ralph Monteiro, senior vice president of the power IC business at AOS, adding that it’s “the culmination of several years of R&D investment in multiphase controllers.”
The Drive for Efficient AI Power Delivery in Data Centers
The latest AI chips are suctioning up huge amounts of power from the electric grid, and the power electronics that convert and condition it on the path into the server, the circuit board, and the processor continue to evolve, too.
Today, the most advanced AI chips in the data center, such as NVIDIA’s Grace Hopper superchip, consume 700 W of power per GPU at peak loads to handle the intense computations used in training and inferencing. As AI proliferates, NVIDIA is pushing the envelope with its latest generation of Blackwell AI chips, the B100 and B200, which can go through as much as 1,200 W each. This also complicates the cooling situation since they run hot.
These AI chips cram in more and more transistors, which inevitably consume higher currents. In many cases, the currents climb to more than 1,000 A, depending on the complexity of the processor. On top of that, these transistors run on very small supply voltages spanning 0.7 to 0.8 V in the most advanced process nodes, with supply voltages of 0.5 to 0.6 V close at hand. Therefore, since Ohm’s Law states power equals voltage × current, the amount of current rushing into high-power AI chips rises sharply.
To save power, companies are adding different power states to the processor, giving it the ability to siphon less current when idle and then jump up to full power as needed. This benefits the system’s power budget, but it creates another challenge for power engineers when it comes to supplying power at peak times. The DC-DC converter in the system needs to handle sudden increases in current, also called load transients (di/dt). These can last from 100 µs to as long as 1 ms when the SoC draws peak power.
When the current rushing into the load suddenly rises—for instance, to run at faster clock frequencies—one potential result is voltage drop. The abrupt plunge in voltage—referred to as IR drop in the semiconductor industry—can cause problems for the system because even slight differences in supply voltage can lead to non-trivial reductions in a processor’s performance or efficiency. The DC-DC converter feeding power to the GPU or other SoC in the server must be able to adjust its output voltage as fast as possible to prevent voltage drop.
Most of the world’s largest players in power electronics are racing to roll out voltage regulator modules (VRMs) that can take care of these challenges. These must be able to supply thousands of amps of current smoothly and efficiently over the last inch of the power-delivery network (PDN) while regulating the voltages entering the processor, in most cases converting a 12-V input voltage from the server’s main power supply to less than 1 V.
To supply more than enough current to the load, these DC-DC converters consist of a number of power stages called phases that split the current between them and power inductors to smooth out the current as it races into the processor. Feeding all of the current through a single phase presents difficulties when designing the power electronics and the magnetics. It also poses thermal issues from a power loss perspective. In a way, a multiphase DC-DC converter is analogous to a multicore CPU, which splits the workload to process it faster.
Central to the VRM is the multiphase controller, which outputs the PWM signals used to drive the power devices. In general, the power stages are placed on the north and south of the processor to feed current from both sides. Other times, they’re mounted directly under the processor to curb transmission power losses, which stem from the resistance in the PCB. As these voltage regulators add more phases to keep up with the power of AI, they also require more advanced digital controllers to manage them all.
In most cases, analog controllers use one PWM signal to drive a single power stage. But that 1:1 arrangement can be complicated and inefficient in multiphase DC-DC converters that supply thousands of amps of current to the processor.
What’s Inside Today’s Multiphase AI Power Controllers?
AOS looks to stay a step ahead of the power demands of AI chips with the AOZ73016QI, which is designed to drive up to three power stages with the same PWM signal. Thus, it can handle up to 48 phases at a time with a single digital controller. In other situations, power designers must use a phase doubler to increase the number of phases in the DC-DC converter by dividing the PWM signal into a separate pair (or more) of interleaved signals.
The chip can sense the on-resistance (RDS(on))—the resistance between the drain and source when the power MOSFET is turned on—and DC resistance (DCR) to balance out the current between these power stages. The digital controller determines the RDS(on) by sensing current in the MOSFET on the secondary side of the power stage, giving engineers the flexibility to use low DCR inductors, which helps boost the system’s efficiency, said AOS.
When paired with power stages based on its trench power MOSFETs, the company said the AOZ73016QI can deliver very high efficiency, saving up to several hundred watts of power spread out over the power stages during peak current events.
For further power savings—and therefore, less heat dissipation—Auto Phase Management (APM) and Discontinuous Mode (DCM) features are added. These features can be engaged by NVIDIA’s power-save interface (PSI) pin.
The digital controller is fully programmable over the PMBUS interface, and it works with AVSBus. It also comes with digitally programmable voltage and current regulation loops to eliminate external components in the DC-DC converter. It adds electronic control system (ECS) programmability with the ability to update the configuration inside the data center and pre-program several different configuration settings with a pin-strap selection.
The power device also comes with output undervoltage protection (UVP), output overvoltage protection (OVP), overcurrent protection (OCP), overtemperature protection (OTP), and overcurrent limit (OCL).
Housed in a 7- × 7-mm QFN package, the multiphase controller is priced starting at $4.00 in 1,000-unit quantities.