You may have heard the suggestion that you should adopt highly accelerated life testing (HALT) to validate the long-term reliability of your new electronic products. Logically, you wonder how much will it cost. That is an appropriate concern, but a companion question is equally valid: How much will failures cost if we don’t adopt a HALT program? (Photo courtesy of QualMark)
Unfortunately, many product manufacturers have found that the cost of not investing in HALT can be extremely high. There generally is a small window for getting new products to market, and it behooves you to use that window wisely to develop and validate gadgets as quickly as possible.
You need a way to identify the components or circuits that are most likely to cause field failures so you can correct the problems before the product goes into production. HALT is a proven technique for evaluating the reliability of your new product in just a few days to see if it will perform satisfactorily for several years.
You can use an outside laboratory for HALT or invest in equipment of your own. The latter approach has a higher up-front cost but many benefits if you develop several products every year.
Other Techniques Don’t Hack It
You may have environmental testing in place now. Sure, it identifies defects that would show up quickly in shipping or use, but they do not accelerate aging or find the weak links through fatigue as rapidly as HALT. Only HALT identifies time-related defects or design problems that otherwise might lie hidden for months or even years.
Almost all defect mechanisms that will fail eventually in the field with normal stresses will fail much faster in the laboratory at higher stresses. You can identify those problem areas quickly and improve the design before thousands of products are in the hands of customers.
The Basics of Reliability
Strength varies among products in a bell-shaped distribution. Some products are extra strong, and some are weaker than the average. Similarly, the stresses applied to a group of products in their lifetime will vary in a bell-shaped distribution. A few loads will be quite severe, while others are extra gentle. Figure 2 illustrates typical real-world strength and stress distributions.
Figure 2. More Realistic Strength and Stress ConditionsAs long as the strength of every product, even the weakest ones, is greater than the applied stresses, even the greatest, there will be no problem. However, if something occurs during the product lifetime to bring the two distributions closer together, as shown in Figure 3, we will begin to have failures.
Figure 3. Stress and Strength Overlap to Create Product UnreliabilityWhat could cause a shift in product strength during its lifetime? It might be a seemingly innocuous design change, a change of assembly personnel, or a change in a component. Whatever the reason, strength distributions do shift as do stresses, and failures are inevitable.
The HALT Process
With HALT, environmental conditions are applied to shift the stress curve, overlapping the strength curve to simulate product aging. The failures that are destined to occur in months or years as the strength curve shifts to the left can be detected in days or weeks by shifting the stress curve to the right as shown in Figure 4.
Figure 4. Stress Curve Overlaps Strength Curve Due to HALTGenerally, HALT uses thermal cycling and random vibration to move the stress curve. With the product operating normally, stress factors are applied one at a time to determine the physics of all failure mechanisms. A typical sequence is low temperature, then high temperature, and then multiaxis vibration. Combinations of these stresses are applied in increasing levels of severity, not to uncover different problems but rather to more quickly find process defects.
The cold environment is applied in 10°C steps, pausing long enough at each set point for the product to stabilize. The sequence continues until the product stops operating or until you reach -30°C. If you can’t reach -30°C, you need to find the reason. It could be a single component, in which case the component should be replaced so stress-margin limit exploration and improvement can continue.
If several components are at the failure point, you have likely discovered the product’s operational limit, and there is no need to reduce the temperature any further. Incidentally, if the product stops working at the low temperature and doesn’t start again when the temperature is raised, this is a hard failure and must be corrected.
The terms soft failure and hard failure succinctly describe failures that heal or do not heal themselves when the test environment moderates. A soft failure is an operating limit, and a hard failure is a destruct limit. The two types must be handled differently.
Next, the temperature is raised from ambient in 10°C steps, stopping at each incremental change to allow the equipment to stabilize. Generally, this sequence continues to about 100°C, the exact point being chosen as a margin beyond the design specification.
Again, if the product stops operating, the reason must be determined and the product repaired. The temperature increases continue until several components fail at or near the same temperature, indicating the fundamental limit of the design.
The random-vibration sequence is applied in a similar manner, as are the combined tests. All of this is unique to each product and must be developed with a thorough understanding of the product and the HALT techniques.
Even though ramping temperature and random vibration are the most commonly used tools for HALT, other techniques sometimes are added to the repertoire. One technique is 1,000 on-off cycles.
There are no test limits in HALT. As each defect is discovered, the test stops until a suitable fix has been determined. The process continues after each design change until failures occur at such a rate that the end-of-life point or material stress limits for most of the assembly have been reached.
A HALT is destructive so products cannot be shipped when testing is complete. However, they can be used for other in-house design validation tests.
Case Histories
Company A had a product that met all its design specifications. However, HALT uncovered a problem at 60°C. The culprit was a small, 15-V auxiliary DC power supply. A regulating diode in the device was near but not actually touching a heat sink. The diode was repositioned to contact the heat sink, which raised the operating limit of the product to 90°C. Without HALT, this problem would have resulted in field failures.
At company B, the annual field failure rate for one of its products was an embarrassing and expensive 5%. Each of the products had been subjected to a 24-hour unmonitored burn-in, which did not precipitate the latent defects.
HALT showed that the operating limit of the product was only 35°C. Two components with limited current-carrying capability were identified and replaced with higher-rated parts, and the operating limit increased to 90°C.
As insurance against process defects, a 1-hour powered and monitored highly accelerated stress screening (HASS) was initiated on all products before shipment. This combined 10g, 200-Hz to 2-kHz random multiaxis vibration with four rapid thermal transitions from -30°C to 70°C at 60°C/min. With the program in place, the field-failure rate dropped to 0.5%.
What It All Means
The decision to adopt HALT is not always easy. After all, company financial resources are involved. However, the costs of not using this reliability-enhancement tool may outweigh its expense. The companies willing to spend money wisely to increase overall profits will be the most successful in the 21st century.
About the Author
Wayne Tustin is the founder and president of the Equipment Reliability Institute (ERI). During a distinguished career that spans many decades, Mr. Tustin has consulted, supplied technical training, and written numerous articles and books on vibration and shock testing. He received a B.S.E.E. from the University of Washington. Equipment Reliability Institute, 1520 Santa Rosa Ave., Santa Barbara, CA 93109, 805-564-1260, e-mail: [email protected].
Kirk Gray is the president of AcceleRel Engineering. Through his affiliation with ERI, he consults and presents seminars on HALT and HASS. Mr. Gray has a B.S.E.E. from the University of Texas at Austin and is a charter member and vice chairman of the IEEE/CPMT Committee on Reliability and Accelerated Stress Testing. AcceleRel Engineering, 1903 Garfield Ave., Louisville, CO 80027, 303-666-7692, e-mail: [email protected].
Return to EE Home Page
Published by EE-Evaluation Engineering
All contents © 2000 Nelson Publishing Inc.
No reprint, distribution, or reuse in any medium is permitted
without the express written consent of the publisher.
September 2000