Then it’s simply a matter of asking the formal engines to prove this property, and they will compute the failure trace backward from the illegal scenario. Of course, after finding the bug, the designer can modify the RTL and run the formal proof on the same property to confirm that the illegal scenario is eliminated. The ability to visualize this process is a feature in Jasper’s tools, which automatically generates and manipulates waveforms without a testbench to answer “what-if” questions and provide visual confirmation of functionality.
To illustrate the power of formal analysis in post-silicon debug, let’s examine a trio of examples from Jasper Design Automation customers who overcame big challenges using these techniques.
THE HAZARDS OF IP BUGS
The first example describes the debug of a memory controller (Fig. 2) violating a bus protocol. This large system-on-a-chip (SoC) with a processor and multiple peripherals was behaving badly in the field, hanging up under certain conditions, and the manufacturer recalled it.
The members of the post-silicon debug team used directed-random simulation. They started with the little information available from the lab: The chip hung and the problem was isolated as coming from the memory controller when it performs a read transaction.
The team worked more than three months until it was able to find the root cause of the bug. The cause was identified in the memory controller, which was sending its data to the wrapper block in a very specific pattern that activated a bug in the wrapper and caused a violation of the bus protocol (Fig. 3). This very specific timing alignment of different events in the system was very hard to hit with random simulation. As a result, it escaped as a silicon bug and was extremely difficult to detect in post-silicon simulations.
Obviously, three months of simulations to find the root cause of the bug was a major problem for the chip manufacturer, its customers, and the intellectual property (IP) supplier, especially since products containing the chip had already reached the end customers, resulting in a recall.
Formal analysis was called upon to help after the fact to see if it could root-cause the bug faster than simulation. The formal engineer got exactly the same information that the simulation team started with. In this case, the bug was found after just two and a half weeks, and much of that time was spent ramping up on the design and protocols involved. Actually, once the setup was finished, and properties written, the runtime to find the CEX was less than a minute compared to weeks of simulations trying to hit the bug randomly.
These are realistic and expected results since formal analysis works mathematically backward from the failing trace to generate its root immediately. With simulation, engineers build a checker to detect the bug when it happens, use direct random to do more reads, and hope to hit the bug eventually.
Afterward, formal analysis was rerun on the fixed RTL code. It uncovered two more new bugs that the post-silicon simulations missed, saving the chip manufacturer another respin.
POST-SILICON PARTNERSHIP
A second case describes how the formal team worked in tandem with engineers in the post-silicon lab. Once the bug was discovered in the lab, formal verification was used to help.
The chip included a series of four blocks in which the bug occurred. Each block processed the data on its inputs and sent it to the next block. The chip incorporated a built-in logic analyzer that can route out a group of signals to be observed on the chip output. However, for the lab team, selecting which signals to probe was always a challenge. It is difficult to know which signals will display the problem. Since the failure trace was detected on the output of the chip, or the outputs of block D, the lab team focused on block D. Thus, the logic analyzer was probing this block.
The formal team initially wrote an end-to-end property (Fig. 4) from the inputs of block B to the outputs of block D. In this case, block A was not relevant. The property was written based on the illegal trace captured in the post-silicon lab, and only inputs that caused the bug were allowed while others were constrained out.
Due to the size of the three blocks, the property did not converge and had to be broken into smaller properties. So, a property was written from the inputs of block D to its output. For this property, it was necessary to add input constraints (Fig. 5) to define legal behavior on the inputs of D. This property proved very quickly. This means that if the inputs of D behaved according to the constraints, block D would never produce the illegal behavior.
Block D was cleared and determined not to have the bug. Next, the post silicon-debug team moved its focus to block C, setting the logic analyzer to capture signals from that block. Meanwhile, the formal team worked on verifying block C. The assumes on block D’s inputs were converted to asserts on block C’s output. In addition, new constraints were written to allow only legal inputs on Block C.
Within minutes, these properties were proven. Now that the formal team had exhaustively exercised and cleared block C, the post-silicon-debug lab changed its focus to block B. The logic analyzer was set to capture signals from block B. In short order, the post-silicon-debug team was able to find the bug in the trace extracted from block B.
This case presents an excellent example of how the two teams can work hand in hand, each using its strengths and sharing information with the other team to quickly isolate a bug.
FORMAL FINDS MISSED BUGS
A final real-world scenario demonstrates the value of this approach for a design team that had not included formal analysis as part of its initial verification strategy. The original verification strategy was simulation and emulation only, with no formal analysis. But even after this exhaustive testing, a bug was found in the post-silicon lab. Inherent limitations with simulation and emulation caused the bug to be missed. It was a corner case and was cycle-dependent.
After executing an engineering change order (ECO), a couple of weeks of simulation and emulation confirmed that the fix was correct. To be certain, formal analysis was applied to double-check the result. Within just a few hours, it was determined the fix was broken. Even in the absence of a test bench, the formal tool found another scenario that would cause the bug. The fix did not cover all cases.
A second ECO, when reviewed with formal analysis, led to the discovery that the same bug could occur in a different part of the logic. Finally, a third ECO passed the formal test with no problem.
In this case, formal review saved the company from multiple respins because its simulation environment would not have uncovered the bugs after rerunning the fixes. The company now designs with formal analysis and has done away with block-level test benches.
SUMMARY
Formal verification tools deliver significant return on investment (ROI) in the post-silicon lab, potentially saving enormous amounts of time and money while protecting a company’s reputation. Addressing post-silicon debug solely with sophisticated, expensive hardware to inject and capture signals provides an incomplete solution. Formal analysis is delivering impressive results for users who cannot afford delay or to ship flawed parts even once in today’s competitive marketplace.
Ambiguity in the post-silicon debug process can have devastating effects. Verification ambiguity includes incomplete test bench setup, improper property specification, software bugs, and more. Each of these can be a potential bug source, so eliminating them early is critical to isolating the root cause of the bug. The benefit of deploying formal verification in post-silicon debug is to remove this ambiguity. Because of high-capacity formal’s exhaustive behavior analysis, conclusive answers can be generated as to whether or not a particular verification component is the source of the undesired behavior, enabling quick isolation of the root cause of the problem.
The real solution is to integrate formal analysis as part of the design flow to root out bugs early. But it can also be used in concert with traditional post-silicon methodologies to ensure production silicon is bug-free. Pairing formal verification with the capabilities of the post-silicon lab produces highly desirable outcomes, as it eliminates working in the dark. The exhaustiveness of formal analysis can rule out the existence of the bug in a given block and prevent multiple respins, as evidenced by the three case studies cited.