329348191 © Juri Tichonow | Dreamstime.com
66b4fc5c78f7390b515c0492 Siliconwafer Dreamstime Xxl 329348191

Can EDA Cut Through the Complexity in Big AI Chips?

Aug. 8, 2024
The semiconductor industry is throwing more silicon at the problem of AI. Ausdia is trying to make sure that it all works when it comes to timing.
316140332_andreyiarmiagov_Dreamstime

To remain relevant in the AI era, semiconductor firms and even many systems companies are rolling out a new class of ultra-large systems-on-chips (SoCs), using advanced process nodes to cram tens of billions of transistors on slabs of silicon pushing the reticle limits of modern chips. These chips contain more than a billion standard cells, growing amounts of third-party IP, and up to thousands of clocks to keep everything coordinated. All of these factors are causing complexity to explode at a time when time-to-market is shrinking.

As the scaling of transistors slows down, it’s also becoming standard practice to bind heterogeneous dies or chiplets in 2.5D and 3D configurations, squeezing even more square millimeters of silicon into a package.

Sam Appleton, CEO of Ausdia, said this complexity is creating challenges for on-chip timing. All signals traveling over these huge slabs of silicon must arrive at the right time to enable smooth, reliable operation. “These chips [and even the chiplets inside them] are pushing reticle limits, which means they are physically as big as the foundries can manufacture. So, one of the challenges we’re all facing is how to verify these giant chips with regards to timing and make sure we didn’t miss anything,” he told Electronic Design.

Most of the major players in electronic design automation (EDA) software are producing more advanced tools for timing closure, which is all about determining the clock frequency of the chip while meeting the timing constraints of the design.

But even using the latest EDA software, capturing such complexity of the latest and largest AI chips can be tricky. According to Appleton, Ausdia is trying to help companies make sense of it all. The company’s software tool can translate the SoC’s building blocks into a more compact abstract model without losing any of the timing constraints, so that other EDA tools can evaluate the timing inside the full chip at the same time.

Ausdia is trying to stay another step ahead of the challenges posed by these colossal chips with its HyperBlock technology, which was unveiled ahead of the recent Design Automation Conference (DAC) in San Francisco, Calif.

Why Timing is Everything in High-Performance AI Chips

The increasing complexity of chips is making timing closure much more of a challenge, said Appleton.

In the latest SoCs, transistors are arranged into tens of millions to tens of billions of logic gates that are bundled into as many as billions of sub-blocks or “standard cells.” These must be placed and routed together on the floorplan of the device to create CPU cores, AI engines, or other building blocks of IP. It’s critical to make sure all of the signals traveling through the chip remain on time, since any signal rolling in too early or too late can interrupt the smooth operation of the device.

“If you opened a single one of these blocks, it could have several million cells inside it, which are place-and-route instances,” said Appleton. “You put that smaller block inside of a larger one, and maybe it holds a hundred million instances, and then you assemble these larger blocks into the final chip. So, if you flattened out the chip, you would have around a billion little blocks you can place and move around and route and connect to each other.”

Many of these large AI SoCs are based on more advanced process nodes, giving them transistors with less leakage and faster clock speeds. But timing delays are dominated by the interconnect wires and metal line resistance. That can lead to challenges in terms of the placement of the IP in the design to prevent longer interconnect delays and reduce routing congestion. If you decide to increase the distance between a pair of IP blocks, for instance, you may have to add pipelines between them to make sure they remain on time.

Timing problems can impair the performance of the chip and raise the risk of everything from overheating to failures. However, resolving these problems can require compromises to the device’s power efficiency and area.

“We have encountered this before where you get the chip [from the fab] and one part of it refuses to work, or it will only work if one person points a spray can of coolant at the chip and someone else starts praying,” said Appleton. He added that in these cases, companies are forced to locate the problem, fix it, and then reorder the chip from the fab, which can cost tens of millions of dollars by itself on top of several months of delays.

Timing inside the chip can be influenced by everything from voltage (IR) drop, temperature, and even slight variations in the construction of the transistors, which become more prevalent at advanced process nodes.

To identify and fix timing problems ahead of time, most semiconductor companies adopt EDA tools such as Cadence Tempus and Synopsys Primetime. which are specifically designed for static timing analysis (STA).

As the semiconductor industry enters the 3D IC era, timing closure is becoming more complex, said Appleton. “In 3D chips, the timing problem is magnified. We’re working with customers that are taking four separate chips at the reticle limit and placing them all on silicon interposers. Then, they have to say to themselves, ‘Were we able to get the timing right—not only for every single one of these chiplets at the reticle limit, but also across all the chiplets in the package?’ So, the scope of the problem is even larger.”

HyperBlock: Capturing the Timing Complexity in Big AI Chips

As Ausdia pointed out, it requires a large amount of computing power and, thus, time to run through these pyramids of silicon to make sure they work as intended and that nothing is wrong when it comes to timing.

Many of the leading names in the semiconductor industry—and the systems companies trying to copy them—have huge data centers that they use to design, simulate, and verify their chips designs before supplying the blueprints to the fab. But even the latest EDA tools for timing closure have trouble loading the largest AI chips. Appleton noted that semiconductor engineers have worked out ways around the problem, including slicing the chip design into smaller parts and then verifying them. But they tend to keep their tricks close to the vest.

“Most semiconductor companies don’t want to discuss what they do because they consider it a trade secret, and we don’t want anyone to know how we do what we do because it’s a competitive advantage,” said Appleton. “It’s one of the dirty areas of the signoff process.”

Instead of adopting a divide-and-conquer approach, Ausdia’s Timevision technology translates the chip’s design into a compact mass of code that captures all its complexity. By feeding it into other EDA tools, you can run through the entire chip to check for timing problems. “We have been one of the industry leaders for verifying very large chip designs, and we regularly work with over a billion standard cells,” Appleton stated. “But even we are running into capacity problems.”

Ausdia is trying to tackle the problem with its HyperBlock technology, which creates intelligent abstractions of even the largest, most advanced AI chips being devised by semiconductor firms and even many systems companies. The company said it reduces the amount of memory required to verify they meet timing constraints by up to 10X while increasing the performance by up to 20X. Appleton pointed out, “We want to be able to load these giant designs, but we also want to do this in a way that is economical.”

Ausdia said HyperBlock can be used at different stages of the design process, even before arranging the chip’s functionality into logic gates (synthesis) and before placing and routing all of the components together. That gives its customers, according to the company, the ability to “shift-left” and start sorting through timing problems early. The HyperBlock itself can be loaded into the top level of the SoC—where the core building blocks of the IC are assembled and connected—with all of the complexity and the timing constraints saved in the HyperBlock.

As chip designers embrace bigger and bigger design sizes, “these companies want to avoid anything that they can in terms of risk because these projects are just so enormously expensive,” said Appleton.

Check out more of our coverage of DAC 2024.

About the Author

James Morra | Senior Staff Editor

James Morra is a senior staff editor for Electronic Design, where he covers the semiconductor industry and new technology trends. He also reports on the business behind electrical engineering, including the electronics supply chain. He joined Electronic Design in 2015 and is based in Chicago, Illinois.

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!