Halt Fig1

Understanding the Role Precision Plays in HALT

For successful HALT testing, repeatability is as important as accuracy.

Generally, one of the more important objectives when implementing an experiment is test reFpeatability. An experiment must be repeatable to validate the test results. In fact, a well-written test report contains all the information needed for anyone to repeat a given test and verify the results. When performing a highly accelerated life testing (HALT) experiment, the same is true, but test repeatability is defined differently.

HALT is a destructive test performed only on a small sample size, typically three to five devices. Test repeatability is defined as the capability to reproduce a failure mode, but the exact stress level required to reproduce the failure will vary. Reproducibility of a failure mode at a precise stress level is not expected nor required.

Highly accelerated stress screening (HASS) is a nondestructive screen where screened production hardware is delivered to the end user. For that reason, when performing HASS, the distribution of stress levels that induce failures is important and characterized before production implementation.

Repeatability in HALT has a different meaning than repeatability in HASS. In both cases, we do not expect repeatability of a failure mode to be reproduced at a precise stress level. Instead, we expect it to be reproducible over a stress range. As a result, tight control of the environmental chamber is not required and represents an unnecessary constraint.

What do we mean by test repeatability? The two most common terms used to describe test repeatability are accuracy and precision. The familiar bull's-eye chart is used to describe the difference between the two. Accuracy is described as the capability to be within a particular measure from an expected value (Figure 1). Here we have good accuracy if the requirement is  3% but the precision is not as good.

Figure 1. Accuracy Bull's-Eye

Precision is the capability to get repeatable results independent of the fact that the results are accurate (Figure 2).

Figure 2. Tightly-Grouped Results Indicating Higher Precision

This brings us back to test repeatability and the role it plays first in HALT. During design validation, HALT is performed to identify the weak points in the design. It is a form of Elephant test where the product is increasingly stressed beyond the product specification limits to identify its operational soft and hard failure limits.

The soft failures are points where the product fails to function properly when under stress but returns to operating normally when the stress is reduced or removed. Hard failures are observed when the product fails to function properly when under stress and does not return to operating normally when the stress is reduced or removed.

The hard and soft failure points identify the initial precipitation and detection limits for HASS. Knowing the hard and soft failure limits allows you to optimize the environmental stress that can be imposed in HASS early in manufacturing to identify reliability escapes, manufacturing issues, and supplier component problems.

However, there is a problem with this theory. The soft and hard limits are not points but distributions. The distribution is the result of variability in components, manufacturing, design sensitivities, and stress.

The hard and soft failures identified in HALT must be reproducible if you expect to make design improvements. If a failure mode can be reproduced, then design improvements intended to remove the failure mode can be evaluated. This is one of the golden rules in experimental design. Like any important rule, it can be misinterpreted.

Some practitioners of HALT try to apply the golden rule of test repeatability to failures found from combinational stress such as temperature and vibration stress levels. They consider it important that a device failure be repeatable based on a particular stress level. They want to be able to expose subsequent devices to the same stress levels and get the same results.

Take, for example, a device that has a failure mode at 70 C. For simplicity, we ll consider the failure is to be a design issue. Since it is a design issue, we expect other devices to fail at 70 C as well.

However, a second device will likely behave differently due to tolerance stack-up, component variability, variability in the assembly process, and environmental stress variability. It would be remarkable for several like devices to exhibit the same design failure at precisely 70 C. The result is shown in Figure 3 where five devices are tested for temperature step stress only.

Figure 3. Distribution of HALT Failures

We expect subsequent devices to exhibit the same failure mode, assuming it is a design issue, but at different stress levels. The more components the device has in its design, the greater the expected variability. We also might find a couple of different failure modes that interact and cause even greater variability.

This example was for only one stress. What about vibration levels? The same arguments hold for vibration stress levels. In addition, consider that when using combined stress such as vibration and temperature the effect of one stress relative to the other is more complex.

For some failure mechanisms, the combined stress accelerates failures but the reverse also is possible. HALT typically is a combinational stress test. If we considered the combinational stress of temperature and voltage, the results would look like the familiar bull's-eye chart in Figure 4. This is what really happens in HALT.

Figure 4. Varying Stress Sensitivity Among Failure Modes

From another point of view, place two like devices in the HALT chamber at the same time and run the HALT temperature stress profile. If there is a design issue, they both eventually will fail.

The slower the rate the temperature is increased, the greater the time between the first and second failure. So what does this mean when it comes to precision control of temperature of the HALT chamber? As we have shown, it is not necessary to buy an expensive HALT chamber capable of very tight temperature control. An environmental chamber that controls temperature to within a degree or two will work well for HALT.

Remember that the important result of a HALT test is what failed, not the stress level required to precipitate the failure, assuming the failure is not the result of a material changing phase states. The intent of HALT is to increase the stress to a device until it fails. The process identifies the weak links in the design. Root-cause analysis is performed to establish the failure mechanism and understand the physics of the failure.

Based on this knowledge, a design change can be made to remove the failure mechanism. A subsequent HALT test is performed to verify that the design change removed the failure mechanism and that no new failure modes were injected.

In selecting a HALT chamber, there are many other things to consider.1 Most HALT chambers are relatively similar in performance parameters. So other factors like reliability, service support, flexibility, ease of use, turnkey capability, and supplier longevity are more important. Be sure to do your due diligence when selecting the right chamber for your HALT needs.

Reference
1. Levin, M. and Kalal, T., Product Reliability, 2003.

About the Authors
Ted Kalal is the director of product assurance and reliability engineering at Flextronics. The University of Wisconsin graduate has held many positions as a contract engineer and consultant. He has also authored several papers on electronic circuitry and holds a patent in the field of power electronics.  Flextronics, 640 Shiloh Rd., Plano, TX 75074, 469-229-2404, e-mail: [email protected]

Mark Levin is Teradyne's reliability manager for product development at Semiconductor Test. He has a B.S. in electrical engineering from the University of Arizona and an M.S. in technology management from Pepperdine University and is a graduate student at the University of Maryland's reliability engineering program. Mr. Levin has held several management and research positions at Hughes Aircraft's Missiles Systems Group, Hughes Aircraft's Microwave Products Division, General Medical Company, and Medical Data Electronics. Teradyne, 30801 Agoura Rd., Agoura Hills, CA 91301, 818-874-7155, e-mail: [email protected]

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!