Mutation-Based Testing Technologies Close the “Quality Gap” in Functional Verification for Complex Chip Designs

1 of Enlarge image

Code coverage

Practical application

Leading-edge chip designs are verified by sophisticated and diverse verification environments, the complexity of which rivals or exceeds that of the design itself. Despite advances in stimulus generation and coverage measurement techniques, existing tools do not tell the engineer how good the testbench is at propagating the effects of bugs to observable points or detecting incorrect operation that indicates the presence of bugs. As a result, decisions about where to focus verification efforts, how to methodically improve the environment, or whether it is robust enough to catch most potential bugs are often based on partial data or “gut-feel” assessments. Thus the application of mutation-based testing techniques is emerging as a viable approach to measuring effectiveness and driving improvement in all aspects of functional verification quality for simulation-based environments.

Existing Methods
Functional verification consumes a significant portion of the time and resources devoted to the typical design project. As chips continue to grow in size and complexity, designers must increasingly rely on a dedicated verification team to ensure that systems fully meet their specifications.

Verification engineers have at their disposal a set of dedicated tools and methodologies for verification automation and quality improvement. In spite of this, functional logic errors remain a significant cause of project delays and re-spins. A key reason is that two important aspects of verification environment quality—the ability to propagate an effect of a bug to an observable point and the ability to observe the faulty effect and thus detect the bug—cannot be analyzed or measured. Existing methods such as code coverage and functional coverage largely ignore these two aspects, allowing functional errors to escape the verification process despite excellent coverage scores.

At its core, code coverage is a simple measure of the ability of the stimulus to activate the logic in the design, where “activate” means execute every line, toggle every signal, traverse every path, or some similarly discrete activity. While this is a necessary condition—you can’t find a bug if you don’t “touch” the code related to the bug—it is certainly not sufficient to expose the presence of all or even most problems in a design. Code coverage says nothing about the ability of the verification environment to propagate an effect of a bug once activated or to detect its presence assuming propagation is achieved. Verification engineers thus accept that while code coverage provides interesting data, it is a poor measure of overall verification environment quality.

Functional coverage is generally more interesting and necessary in its own right. In basic terms, it provides a way to determine if all important areas of functionality have been exercised, where “important” is defined in various ways, such as “all operational states,” “all functional sequences,” or the like. The rub is that, by definition, functional coverage is subjective and inherently incomplete. The functional areas (functional coverage “points”) to be checked are defined by engineers and typically based on the design specification, which thoroughly describes how a design should operate, but does not provide a comprehensive view of how it should not operate.

If the specification considered all possible bugs that could exist in the design and described how they might manifest themselves in terms of function, then it would be a simple matter of translating this list into a set of functional coverage points to be checked during verification. Alas, this is not the case. While functional coverage provides useful data—engineers need some means of determining if they are verifying the functionality laid out in the specification—like code coverage (Fig. 1), it is a poor measure of overall verification environment quality.

Continue on next page

Mutation-Based Techniques
Mutation-based testing technology may be the key to addressing the shortcomings of existing tools and to closing the “quality gap” in functional verification. Originating from software research in the early 1970s, this technique aimed to guide software testing towards the most effective test sets possible. A “mutation” is an artificial modification in the tested program that is induced by a fault operator. It changes the behavior of the tested program. The test set is then modified in order to detect this behavior change. When the test set detects all the induced mutations (or “kills the mutants” in mutation-based nomenclature), the test set is said to be “mutation-adequate.” Several theoretical constructs and hypotheses have been defined to support the validity of mutation-based testing.

In the field of digital logic verification, the basic principle of injecting faults into a design in order to check the quality of certain parts of the verification environment is known to verification engineers. Engineers occasionally resort to this technique when there is doubt about the testbench and no other way to obtain feedback. In this case of “hand-crafted” mutation-based testing, the checking is limited to a very specific area of the verification environment. Expanding this manual approach beyond a small piece of code would be impractical.

The automation of mutation-based techniques, however, offers an objective and comprehensive way to evaluate, measure, and improve the quality of functional verification environments for complex designs. Applied intelligently, a mutation-based approach provides detailed information on the activation, propagation, and detection capabilities of verification environments and identifies significant weaknesses and holes that have gone unnoticed by classical coverage techniques. The analysis of the faults that don't propagate or are not detected by the verification environment points to deficiencies in stimuli, observability, and the checkers that are used to detect unexpected operation and expose the presence of design bugs.

Functional Qualification
Using mutation-based technology to measure quality and find weaknesses in the verification of digital logic designs is known as “functional qualification”. This technology was pioneered in 2006 by Certess with the Certitude Functional Qualification System. The practical application (Fig. 2) of functional qualification technology involves a three-step process.

The fundamentals of the process are:

static analysis to determine where faults can be injected and to write out a new version of the RTL code that enables this injection;
simulation to determine the set of faults that are not activated by any test and, for those that are activated, correlate each fault with the tests that activate it;
and a detection process that injects faults one at a time, runs the relevant (correlated) tests to determine if a given fault is detected, and in the undetected case provides feedback to help direct the fix (add a missing checker, develop a missing test scenario, etc.).

Technology Advancements
The complexity of today’s designs and the number of tests required to verify them demand additional automation and intelligence in the functional qualification process. This is required so as not to render this approach impractical given the combination of possible faults and associated tests. In some cases, specialized algorithms are needed to analyze the design and refine the list of faults prior to qualification. For example, it is likely that certain faults are undetectable, either due to redundant logic or “dead code” that has no path to an output. A rigorous but efficient formal analysis can identify and eliminate these faults during the static analysis phase to save significant simulation time during the activation and detection steps. More process-oriented optimizations that gather and store critical information about each test during the activation phase, such as simulation time required, number of faults activated, and so on, can be used to great advantage in making the detection process more efficient.

Continue on next page

To date, perhaps the most important technical breakthrough is in the area of fault classification and prioritization. Research and practical experience have shown that certain faults are much more likely than others to expose the most significant verification weaknesses. The identification of these “high-priority” faults, based on a static analysis of both the structure and functionality of the design, enables a more automated, priority-driven qualification approach. Initial qualification of this subset of faults gives quick feedback on the overall health of the verification environment. Significant improvements can be immediately leveraged in the ongoing testing of the design by fixing problems through the addition of checkers or test scenarios. This priority-driven approach enables efficient early qualification, supports a powerful incremental use model, and provides both motivation and justification for applying partial functional qualification in cases when a more complete qualification is neither possible nor practical.

Qualification Methodology
Methodology considerations are vital to making functional qualification technology relevant for today’s chip designs. The core aspects of a good methodology are early qualification and incremental usage that accumulates results and drives improvement over time.

Applying functional qualification early in the verification process can yield significant benefits. Clearly, qualification on “the first day of verification” is not advisable, since there are likely enough known problems to fix and holes to plug before the advice and insight of a separate tool is warranted. Waiting until “the last day of verification” is also a bad idea, as it is probably too late to leverage the qualification information without slipping the project schedule. Although the timing will vary by design and verification team, experience shows that qualification can begin as soon as there is a core set of tests that exercises the major functional aspects of the design.

The starting point for functional qualification might be indicated by the achievement of “reasonable” baseline code coverage scores. It may be based on completion of a certain percentage of the test plan, or it may simply be driven by the “gut feel” of the verification team. As soon as the appropriate threshold is reached, qualification using the subset of high-priority faults described in the previous section can begin. The fixes driven by these early qualifications—new checkers, better test scenarios, and so on—not only improve the verification environment, but also provide guidance on the deployment of resources. Subsequent qualifications are then used to confirm the fixes, and qualification of a broader set of faults can continue as the design and verification environment mature.

Incremental usage—running functional qualification early, fixing problems, and running again—allows for the accumulation of qualification data over time. Faults that are detected don’t need to be re-qualified later unless changes are made to the related RTL code. Research has shown that fixing “big problems” early also tends to fix many “smaller” or more subtle problems related to the same missing checkers or test scenarios, resulting in an overall more efficient qualification-verification process.

Compatibility with Common Verification Practices
As discussed, functional qualification must be practical to be of any significant benefit. This requires that functional qualification systems are tightly integrated with existing commercial simulators and advanced debug systems and fully compatible with current verification methodologies such as constrained random stimulus generation and assertion-based verification.

When used with a constrained random stimulus approach, qualification should begin with the base tests (or a subset of the base tests that provide good coverage) and a very small number of seeds. Faults that are activated and propagated but not detected by this subset of tests are likely to remain so, even if more tests or seeds are added simply because the missing checker will not “appear” with the application of additional stimulus. Once the verification environment is fixed to detect these faults, qualification can continue with additional tests and seeds to improve the activation and propagation aspects.

Continue on next page

An assertion-based verification methodology typically matches up well with functional qualification. Assertions provide the basic checking infrastructure that supports the verification environment’s “pass/fail” decision-making process, which is analyzed during the fault detection step of qualification. Non-detected faults drive the improvement of this infrastructure by identifying missing or incorrect assertions that must be added or fixed in order to detect these faults.

As with functional verification, debug and analysis of functional qualification results is critical. The ability to automatically generate waveform information related to a fault and compare it against the fault-free waveforms provides easy diagnosis and resolution of issues related to non-detected faults. To resolve more difficult problems, the ability to display, highlight, and traverse source code associated with faults is also vital. Tight integration with existing advanced debug systems not only supports these requirements, but enables engineers to debug in the same environment used to view and analyze simulation results.

Practical Applications
A growing number of verification teams are successfully deploying mutation-based techniques to qualify their verification environments, identify and correct holes and weaknesses that can let serious RTL bugs slip through the verification process, and ensure high quality ICs and components. The range of applications includes assessing and improving verification of in-house IP blocks, evaluating the quality of third-party IP, and qualifying environments for top-level system-on-chip (SOC) verification

Improving Verification of In-house IP
Many companies consider the development of high-quality IP as a critical component of their overall design and verification strategy. This IP often embodies the unique capability and functionality provided by the company and will be re-used in many SOC designs. Applying mutation-based techniques to ensure complete verification of these IP blocks is an obvious and high-leverage application of functional qualification technology.

In terms of use model, application of this technology to internal IP verification environments is straightforward. The IP development team deploys functional qualification early in the verification process starting with qualification of high-priority faults that are likely to expose the biggest problems in the environment. Results are used to make the environment more robust and to guide further verification efforts. Additional faults are added and qualified as the verification environment matures, and the accumulated results provide a comprehensive confirmation of verification environment quality.

Subsequently, users may opt to run a statistical sample of the faults and tests to calculate a set of percentages that provide an approximate but relatively quick snapshot of how “good” the environment is at catching potential bugs. These numbers can be used to ensure that the quality of the environment doesn’t degrade as the design changes or as part of the sign-off criteria when delivering IP to SOC teams for integration.

Assessing the Quality of Third-party IP
The quality of IP design blocks acquired from external sources is always a concern. Problems in the IP description can be extremely difficult to debug during SOC integration and verification, costing untold man-hours and in the worst-case causing project delays or even design re-spins.

When IP is delivered as RTL source code and accompanied by a sign-off test suite, mutation-based techniques can be applied to assess the quality of the suite and gain insight into potential problems that might be encountered during integration. To facilitate efficient analysis and debug, this approach is best used when the team on the receiving end has some familiarity with the RTL code. Analysis of the IP block may start with the statistical sampling mode as mentioned earlier. If the statistical assessment indicates potential problems – perhaps not meeting the criteria established for internal IP – then additional verification may be warranted. In extreme cases, indications of sub-par verification may be the deciding factor in the selection of the IP vendor or spur the search for alternate IP providers.

This same approach can be applied to assess the quality of legacy blocks targeted for re-use in new designs. A quick assessment using the statistical sampling approach identifies blocks that require additional verification or highlight specific areas that may require extra attention during SOC integration and verification.

Qualifying the SOC Verification Environment
Size and complexity are usually the limiting factors in all aspects of SOC verification. Typically, methodology comes to the rescue, sometimes in the form of “correct by construction” methods or by running many simulations in parallel across a large server farm. The same is true for mutation-based techniques – methodology is key.

The classic or best approach is to initially qualify the SOC building blocks using the “internal IP” methodology described earlier. Incremental qualification of faults, starting with those of highest-priority and moving to others as the environment matures, ensures high-quality IP as a foundation for building the SOC. During SOC verification, engineers can turn their attention to issues that are unique to this stage – problems attributed to block interface issues or the top-level I/O of the SOC – using a special class of “connectivity faults” designed to check the robustness of these areas. Weaknesses in verification found at this stage can hide serious problems related to the communication protocol between blocks or the processing of critical I/O signals. Fortunately, the relatively small number of faults that need to be qualified in these cases allows for practical use of the technology with large designs.

Closing the Verification Quality Gap

Verification of leading-edge chip designs is an extremely complex process, and the associated environments are often more complex than the designs themselves. The opportunities for problems that let RTL bugs slip through the process are myriad. Missing or broken checkers fail to detect incorrect design operation. Inadequate test scenarios exercise the entire design, but don’t propagate the effects of hidden bugs. Mistakes in the “wrapper scripts” that launch and manage the verification process allow large sets of tests to be marked as passing when they should fail.

Current techniques such as code coverage and functional coverage provide interesting and sometimes useful data on verification status and progress, but do not address these critical issues. The incomplete and subjective nature of these approaches leaves too much room for error and oversight. By contrast, mutation-based techniques provide both a comprehensive and objective assessment of verification environment quality. Functional qualification builds on these techniques to measure the environment’s ability to activate logic related to potential bugs, propagate the effect of those bugs to an observable output, and detect the incorrect operation and thus the presence of the bugs. In doing so, it identifies serious holes in the verification environment and provides guidance on how to close those holes. Through automation and intelligent operation, combined with the right methodology, functional qualification is now both practical and accessible throughout the design and verification process.