Realizing the Benefits of Faster Reliability Verification with Cloud Computing
By Matthew Hogan and Derong Yan
With each process node, integrated-circuit (IC) designs get more complex. Effects that were nominal at previous nodes suddenly become critical. Design rules become more complicated, requiring more checks that take longer to run. Deadlines are in peril, as you see business opportunities slipping away.
Sound familiar? While that scenario may ring true for all IC verification flows, reliability verification can be particularly at risk. Product performance and expected life are essential to market success, but as elements like electrostatic discharge (ESD) and latch-up protection become more complex (Fig. 1), achieving reliability goals has never been harder. Designers no longer have the luxury of hand-checking critical paths. Time is of the essence, and automated reliability verification is now a non-negotiable component of tapeout.
Fortunately, with the advent of foundry reliability rule decks, EDA companies have been able to provide automated reliability verification tools and checks to support design companies.1,2,3 However, the rapid and substantial growth in the number and complexity of reliability checks has introduced a new limitation—the availability of sufficient compute resources.
Many design companies must confront budget and resource issues on a regular basis. They simply don’t have the money and manpower to provide enough on-site compute resources to ensure reliability verification runtimes that can keep a design on schedule. Until recently, accepting longer runtimes was their only option. Not any longer.
Commercial cloud-computing services are now a scalable and sustainable option to enable companies to satisfy “peak demand” periods when validating a full chip with foundry rule decks. To ensure they can make informed cost/benefit decisions that achieve maximum business benefit from cloud computing, however, companies must understand the requirements, limitations, and costs they will encounter.
The cost of cloud computing is typically a factor of the number of servers used, the class of the machine, and the total usage time. The optimal type and number of cloud servers to use, and the available configurations, depend on the types of the reliability verification flows you’re running, the EDA tool you’re using, the size of the design, your tapeout timeline, and how much money your company is willing or able to spend on cloud access.4
As a demonstration of how a company might evaluate the potential benefits of cloud computing, we ran a series of experiments on a full-chip system-on-chip (SoC) design, using the Siemens EDA Calibre PERC reliability verification flows with a major commercial cloud service provider. We ran the same Calibre PERC flow a total of three times in different cloud configurations:
- 1 cloud server with 16 physical cores using Calibre multi-threaded (MT) technology.
- 5 cloud servers, each with 16 physical cores, organized as 1 primary + 4 remotes in the Calibre flexible MT (MTflex) configuration.
- 51 cloud servers, each with 16 physical cores, organized as 1 primary + 50 remotes in the Calibre MTflex configuration.
For 1 server, 5 servers, and 51 servers, the Calibre PERC run completed in 106 hours, 31 hours, and 9.5 hours, respectively (Fig. 2). In addition, memory for each of the MTflex runs was reduced by 10% compared to the single-machine MT run.
Reviewing these results reveals that, for this particular design and set of checks, a company would need to spend 3X the cost on cloud hardware to achieve about a 3X runtime improvement. For many fabless SoC design companies, the business value of converting a multi-day Calibre flow into an overnight run would be worth far more than the extra expense, particularly when multiple iterations are expected.
Obviously, the actual ratio between the cost and runtime improvements will vary from one company to the next, and even between designs and process nodes. Companies must develop their own set of data to enable them to make practical cloud-computing choices that best benefit their business goals.
Running reliability verification flows using commercial cloud services to satisfy peak demand usage can increase productivity and expedite turnaround times. The ultimate value of those improvements, though, depends on your business goals and markets. Coupling cloud performance data with business objectives allows companies to make pragmatic cloud-computing decisions.
For more information, download a copy of our technical paper, Reliability verification in the cloud delivers significant runtime benefits.
References
1. D. Yan, “Ensuring Robust ESD Protection in IC Designs,” Siemens Digital Industries Software, 2017.
2. EDA Tool Working Group (ESD Association), “ESD Association Technical Report,” ESD TR18.0-01-14.
3. D. Yan, “Checking ESD Path Resistance in IC Designs,” Siemens Digital Industries Software, 2020.
4. O. ElSewefy, “Calibre in the Cloud: Unlocking massive scaling and cost efficiencies,” Siemens Digital Industries Software, 2019.
Authors
Matthew Hogan is a product management director for Calibre Design Solutions at Siemens EDA, a part of Siemens Digital Industries Software. With more than two decades of design, field, and product development experience, Matthew works with companies that have an interest in reliability verification and the Calibre® PERC™ reliability platform. He is an active member of the International Integrated Reliability Workshop (IIRW), served previously on the Board of Directors for the ESD Association (ESDA), contributes to multiple working groups for the ESDA, and is a past general chair of the International Electrostatic Discharge Workshop (IEW). Matthew is also a Senior Member of IEEE, and a member of ACM. He holds a B. Eng. from the Royal Melbourne Institute of Technology, and an MBA from Marylhurst University. Matthew can be reached at [email protected].
Derong Yan is a principal product engineer in the Calibre Design Solutions division of Siemens EDA, a part of Siemens Digital Industries Solutions. His primary focus is the Calibre PERC reliability platform and reliability verification strategy. Areas of expertise include SoC physical design and verification, reliability verification, and design automation. Before joining Siemens, Derong worked for multiple semiconductor companies. He holds a Ph.D. in Materials Engineering from the University of Alberta and received both his M.Sc. and B.Sc. from Shanghai Jiao Tong University. Derong can be reached at [email protected].