Reducing The Design Impact Of DFT In The Nanometer Era

Design-for-test (DFT) is essential to ensure that complex designs can be thoroughly tested. Testing demands continue to increase as designs grow in gate count and fabrication process technologies evolve. Fortunately, advances in DFT techniques have avoided major design requirements and restrictions for test. In fact, some approaches have reduced the impact of test on designs.

Structured DFT techniques are commonplace due to their high fault coverage and support by automated test-pattern generation (ATPG) tools. Scan technology and memory built-in self-test (BIST) are the foundation of most structured test techniques. A device’s sequential elements are evenly divided into scan chains that are loaded through device I/O in parallel. Multimillion-gate designs typically are forced to maximize the number of scan chains to minimize scan pattern depth and test time. This way, tester memory limitations are respected and test time requirements are met. Therefore, scan often requires many device I/O pins to load scan chains during test.

Large memories embedded within a device usually are tested with specific memory-test algorithms. The most popular memory-test approach is to use BIST circuitry to provide test stimulus and verify responses on-chip and at-speed.

However, several issues in the past few years have complicated test requirements. The population of timing-related defects has significantly increased with fabrication processes of 130 nm and smaller. As a result, at-speed scan test is now necessary to sufficiently detect these timing defects.¹

At-speed scan transition tests require many more test patterns than traditional stuck-at tests. In addition, accurate clocking is needed for high-quality tests. This increases the demand on the test environment to support these additional tests. Furthermore, many companies are considering additional tests to further improve test quality, such as multiple detect patterns and deterministic bridge-targeted tests based on physically extracted layout parameters.

Test approaches can also impact the design flow. Many design teams construct the design in pieces to simplify the overall process. Designs often are partitioned into blocks that are independently designed, then assembled together at the top level. Any additional test logic or routing for test complicates this process. Unfortunately, large devices that use this type of modular or hierarchical approach often have many scan chains. Therefore, many scan routes are common at the top level.

Another test issue in many designs is the extensive use of distributed small memories. If memory BIST is used for these memories, then there could be a measurable silicon-area impact from adding multiplexing, routing, and BIST controllers. The dilemma is determining how to apply the necessary memory algorithms without causing a large impact in silicon area.

Growing test demands create several design implications that must be mitigated:

High-speed tester clocks and fixturing for at-speed test
High-speed I/O to support at-speed tester clocks
Increased silicon area and routing for memory BIST of small memories
Increased tester capacity to accommodate the application of many patterns
Large number of I/O to support scan tests
Many top-level routes to support scan chains

Even so, various DFT approaches can provide the desired test capabilities with minimal design impact.

Using PLLs for accurate at-speed test An external clock is used during scan testing to load the scan chains for each pattern. Supplied by a tester, this clock usually operates at a relatively slow frequency. To apply at-speed scan tests, high-frequency clock pulses must be applied after loading the scan chains. Applying high-speed pulses from a tester can be problematic, though.

It’s difficult for a tester to mimic a device’s internal phase-locked loop (PLL) waveform. In addition, many devices have higher speed internal clocks than external I/O. So even if a tester can supply accurate clocking, special I/O pins may be required so the clocks can get from a tester to the device gates.

A method for supplying accurate clocking without requiring high-speed I/O and tester clocks is to reuse the internal PLL clocks during test (Fig. 1). This technique has been shown to be very effective.² The clock-switch design passes the PLL clock into the scan-bypass clock path. Consequently, there’s no impact to the PLL functional clock tree. The benefit is that functionally accurate clocks can be applied while reducing tester clock and device-I/O requirements.

ATPG tools can understand models of PLL clock switches. During the ATPG process, the switches that must be active are automatically configured and controlled by ATPG tools. Special tester clock capabilities, high-speed fixtures, and high-speed test I/O aren’t necessary, yet testing in the gigahertz range is still possible using this clock-switching approach.

Ultimately, a device’s clock-generation logic can be used in its native mode of operation by simply adding the clock-switching design (Fig. 1, again). The clock-switching logic can generate the necessary at-speed clock pulses required for at-speed transition and path delay scan testing. As a result, designers can replace functional test content that’s difficult to generate and to evaluate in terms of quality with scan-ATPG test content. This significantly reduces test development time, and test quality is easily graded.

Non-intrusive memory test It makes sense to test large embedded memories using BIST. This is the standard approach, and it’s widely accepted in the industry. Memory BIST controllers can support many algorithms and very high-frequency test (Fig. 2).

But a testing problem arises when numerous small, embedded memories are distributed throughout the design. If there are just a few such memories, then a BIST-based approach using existing controllers may be reasonable. Yet when hundreds of small, embedded memories are in the mix, as often is the case, area overhead and routing become issues. Furthermore, some of these small memories may be located in timing-critical areas.

Typically, BIST controllers are partitioned across the design according to frequency and physical placement of the memories being tested. This can result in the need for dozens of shared BIST controllers for testing hundreds of memories. The silicon impact of BIST controllers running a fixed set of algorithms in a design with such a high number of embedded memories can easily exceed 50,000 gates. For many designs, such an extensive use of BIST would cause an unreasonable increase in gate count, new timing paths, and routing overhead.

A test technique sometimes called macro testing enables a specific test sequence to be applied using existing device logic. Each pattern of the desired test sequence is converted into a scan pattern, and the values of the macro inputs are defined. ATPG tools can then use existing ATPG algorithms to determine how to load existing scan cells so the desired values are present at the macro inputs.

Similarly, the expected values at the macro outputs are propagated to scan cells for capture and verification. Existing scan chains can be used to test embedded memories simply by translating user-defined memory test algorithms into scan vectors (Fig. 3). This eliminates the silicon impact associated with adding BIST logic while still meeting the testability requirements. As an example, if a 128-bit address memory is to be tested with a six-pass March algorithm, then 768 patterns must be defined. Macro testing this device will convert the 768 patterns into 768 individual scan patterns. Furthermore, different types of memories can be tested in parallel. Hence, the 768 macro test patterns can be used to test many different memories with 128-bit or smaller addresses.

Macro testing also can be applied to many macros in parallel. Macros can be applied to any logic with binary values, but usually they’re applied to memories. The desired pattern for each memory type and the instance locations of each memory are defined. This is enough for ATPG tools to automatically produce the macro test patterns.

At-speed testing is also possible using the macro-testing technique. A sequence of several patterns can be defined as an at-speed sequence. However, a read operation of a memory can only be verified if it’s captured and scanned out. An at-speed read can’t be verified if it’s followed by an at-speed write or read. A typical at-speed macro test sequence is write-read-capture, which aligns well with most memory-test algorithms. As a result, specific at-speed test sequences like March tests can still be applied to small, embedded memories without adding more test logic. Ultimately, very high memory-test quality can be achieved using at-speed macro testing with the aforementioned PLL clock switching.

Reduce top-level test signal routes from blocks Often, design teams try to have as little top-level logic and routing as possible. If many scan chains exist, then many top-level routes are necessary. Yet one compression technique that was primarily developed to support the increase in test patterns can address the top-level routing issue. Test-compression techniques enable patterns to be applied with dramatically fewer tester cycles per pattern. The compression capability also can be used to reduce the number of scan channels controlled by the tester through the device I/O.

The development and use of scan test-compression techniques provides a way to apply at-speed and other additional patterns without increasing tester time. Technologies such as embedded deterministic test (EDT) use a transform function to convert values loaded by a tester to specific bits within scan cells.³

Normally, scan patterns load scan cells with the bits that are necessary to detect targeted faults. Only a small fraction of scan cells can be loaded with useful values, though. All other scan cells are loaded with random values. One basic approach with compression techniques exploits the fact that most scan cells can be filled with pseudorandom data. The pseudorandom filling provides some coverage of non-targeted faults.

A decompressor can operate as a transform function to convert tester-loaded bits to specific bits within the scan chains. The non-specified bits are randomly filled from the on-chip decompressor and not from the tester. Consequently, each pattern can be loaded with up to 100 times fewer test cycles/bits.

In an example of a circuit with test compression, a decompressor and compactor are added to the design (Fig. 4). They connect only to the design’s scan chains and have no impact on the functional logic. This example shows a device with two scan channels that are loaded by the tester. Internally, the design is configured with many more internal scan chains than normal.

A traditional scan design with two channels would result in one bit being loaded into each of the two chains with each clock pulse. The compression technique results in two bits being loaded from the tester, as is the case with standard scan. But the decompressor can supply 100 or more bits to many internal scan chains with the same clock pulse.

Since there are many internal scan chains, the lengths of the chains are much shorter than normal. Hence, the scan chains can be loaded with dramatically fewer tester cycles. The quality and coverage of the test-compression patterns are the same as with traditional scan.

Test compression can result in over 100 X fewer tester cycles. For designs that don’t require 100 X compression, some of the compression capability can be used to reduce the scan I/O pins necessary for test. This is possible even to the point of only using one scan channel.

For instance, adding compression to a design with six scan chains can both speed up test time and reduce scan-I/O requirements. If the design has 60,000 scan cells, using six scan channels would normally result in taking 10,000 cycles to load each pattern. For this example, the design with compression could be configured with 600 internal scan chains and six external scan channels, providing an opportunity for 100 X compression.

But if only 20 X compression is necessary to apply all of the patterns within the desired time, some of the compression can be used to reduce the I/O pins. In this case, the scan-I/O pins can be reduced by 3 X to two channels and configured with 200 internal scan chains. As a result, the scan-I/O pins are reduced by 3 X while the patterns are applied more than 20 X faster. These two channels can load each pattern in 300 cycles (60,000 cells/200 internal chains).

The ability to reduce scan channels can dramatically reduce the design impact due to test. Fewer scan channels means less pins for test in the design, simpler test fixturing, and simpler test requirements. Reduced pin-count testing is also possible if boundary-scan features, intended to test device I/O, are configured as scan cells (Fig. 5).⁴

This approach reduces tester pin access enough so multiple die or devices can be tested in parallel. Such a test approach, known as multisite testing, can dramatically improve tester throughput.

Test compression also supports modular and hierarchical implementations. Therefore, the decompressor and compactor can be inserted within individual blocks. The top level of the design doesn’t require any compression logic. Additionally, the scan-channel routes for each block can be cut to one channel using embedded test compression.

References:

B. R. Benware, et. al., "Effectiveness Comparisons of Outlier Screening Methods for Frequency Dependent Defects on Complex ASICs," VLSI Test Symposium 2003.
M. Beck, et. al., "Logic Design for On-Chip Test Clock Generation –Implementation Details and Impact on Delay Test Quality," DATE 2005.
J.Rajski, et al., "Embedded deterministic test for low cost manufacturing test," Proc. ITC, pp. 301-310, 2002.
J. Jahangiri, et. al., "Achieving High Test Quality with Reduced Pin Count Testing," ATS 2005.