Static Timing Analyzer Goes Multicore And Distributed
Synopsys continues to follow through on its multicore initiative, which the company announced in March 2008. This time, multicore capability has come to the PrimeTime static timing analysis tool. Synopsys claims speedups of up to two times for PrimeTime 2009.12, which offers the flexibility of both multi-threaded and distributed multicore processing capabilities.
The reasons why EDA vendors adopt multicore technology for their tools are well known by now. Synopsys, like all EDA vendors today, keeps a close eye on its customers’ prevalent server technology to ensure that its tools will work in their hardware architectures. The trend these days is a move from servers with two chips having two cores each toward servers with Intel’s quad-core technology. Meanwhile, the goal is to enable tool users to extract usable gains in runtime performance.
New to this release of PrimeTime are multithreaded capabilities, enabling the tool’s analysis runs to be distributed among the multiple cores in a given machine with shared memory. According to Steve Smith, senior director of platform marketing at Synopsys, this capability results in a twofold speedup on a quad-core machine.
Multithreading is a nice boost for PrimeTime, but the real news here is the addition of support for distributed processing. With this capability, a design can be partitioned and farmed out to multiple compute resources. “Some tools are easier to architect for distributed processing than others,” says Smith. “But static timing analysis has always been pointed to as among the hardest and we’re proud that our R&D team was able to achieve it.”
Several trends are driving the requirement for distributed processing capability in EDA tools. For one, designs are becoming more complex, which in turn requires the investigation of more scenarios for signoff. Meanwhile, designers are being forced to reduce design margins even as the teams working on these much larger chips remain at the same size.
But on the other hand, says Smith, these difficult economic times are causing customers to hold off on hardware investments, preferring to control costs by retaining their existing compute farms. “As of November, our Japanese customer base was not planning on upgrading their hardware,” says Smith. “That’s different from what we’d heard in previous years, when people were looking at a two- or three-year cycle to replace or refurbish machines with next-generation hardware.” So the increased efficiency that PrimeTime can offer is one way of making up the difference of having to manage the design of larger chips with no more help and no new hardware.
Synopsys implemented both the multicore and distributed approaches because of the diversity in users’ server-farm configurations. “One solution doesn’t fit all,” says Smith. “Some say they have all new four-core machines… So if a customer is working on a new chip, they have to assess whether they can run their big TA compute jobs on existing machines. A solution that lets them stay on their old compute farm longer is a big cost saving.”
According to Ken Rousseau, vice president of engineering for PrimeTime, both multicore and distributed processing roll off at some point. “For threading there is always a sweet spot for a given problem. Then, it rolls off to where throwing more cores at the problem doesn’t give you any more throughput. The same goes for distributed processing,” says Rousseau.
In revamping PrimeTime, Synopsys decided to implement distributed processing first even though it’s harder to accomplish. “Timing analysis is supposed to be exhaustive,” says Rousseau. The tool attempts to examine all of the signal paths and can be thrown off track by false paths. Further, the overall design is encapsulated within a single tightly coupled timing graph that is very difficult to partition.
“Signal integrity (SI) is the real complicating factor. On top of the logic itself is all of the interactive coupling that adds another layer of complexity. Trying to efficiently carve it up in the presence of SI was the real issue. We’re generally within a few percent of ideal distribution,” says Rousseau.
“One disadvantage of distributed processing is having to send things off to other machines. That incurs overhead,” says Rousseau. “Partitioning the design is not a trivial task, especially making sure that you don’t end up with one partition with most of the design and a few small ones, which eliminates the gain you get from distributed processing in the first place.”
When using PrimeTime in multicore mode, all of the processing is done on one machine, which means you cannot throw the resources of a server farm at the problem (see the figure). Synopsys is working on implementing a multiplicative solution using both approaches, says Rousseau.
“We hope to tune it so that if people have lots of multicore machines, they can multithread on cores on these distributed machines,” Rousseau says. Today’s multicore machines have four or eight cores, but it won’t be long until there are 100-core machines, he adds. “This architecture will scale up nicely,” he says.
“What we’re really doing with PrimeTime is giving users a choice,” says Smith. “They can run timing-analysis jobs in the best configuration for their compute environment. If they have a big farm with small machines, distributed processing makes sense. If they have enough high-end standalone machines, a threaded approach is best. If they have a combination, or their next chip is right on the edge of using the next larger machine, they can take advantage of both, threading their partitions to get the speed up.”