MTBF Revisited A Tutorial

Almost everyone in the electronics industry is familiar with the term mean time between failures (MTBF). Quite often, the term is misinterpreted and misapplied. This is especially true when products are delivered, failure reports begin to arrive, and MTBF predictions are not validated by tabulations of real-world trouble reports.

Were the MTBF calculations wrong? What happened? To understand the prediction of reliability, we need to examine these frequently asked questions.

Defining Reliability

Before we consider reliability predictions, let’s look at the meaning of reliability. The generally accepted definition of reliability is the probability that a device will provide adequate operation for a given time in its intended application. This involves two judgment questions:

What is adequate operation?

What is the intended application?

Keep in mind the answers to these questions as we look at the part of that definition that can be measured: What is the best estimate of the probable MTBF?

Back to the judgment items. If your car radio has suitable AM reception but fails to receive FM stations, is the entire car unreliable? Or is this an inappropriate criterion for making such a sweeping judgment? And if you drive through two feet of water and the car stops, is the car unreliable, or is such treatment outside the bounds of its intended application? The reliability specialist must sort out these types of questions before proceeding to MTBF calculations and define the answers as part of a reliability prediction.

Predicting MTBF Through Calculations

Two widely accepted standards can be used to calculate MTBF. Most government programs prefer calculations per the latest version of MIL-HDBK-217, while many commercial programs use the Bellcore method.^1,2 The current government version is based on work begun many years ago by the Reliability Analysis Center and Rome Laboratory at Griffiss Air Force Base. The Bellcore version is a derivative of that handbook, modified and simplified in 1985 by Bell Communications Research, now Telcordia Technologies.

Each document contains failure-rate models for parts used in typical electronic products, such as ICs, diodes, transistors, capacitors, relays, switches, and connectors. The rates are based on the best available data from actual applications. Several differences exist between the two methods, one of the most obvious is the expression of failures per 10⁶ hours (MIL-HDBK-217) or failures per 10⁹ hours (Bellcore).

As an example of an MTBF calculation, consider a hypothetical product with four parts. The estimated failure rates per 10⁶ hours for those parts, operating at a given temperature, are available from the manufacturers. Adding the estimated rates, we get the estimated failure rate for the total product. To determine the MTBF, we divide 10⁶ by the product failure rate, giving us the estimated mean number of hours between failures (Table 1).

Though most MTBF predictions are based on a single product, a more realistic way of expressing the result is based on 100 or 1,000 products. If we have a failure rate of 1.00 product per 10⁶ hours, then 100 products will have a failure rate of 100 products per 10⁶ hours. The MTBF of 100 units then is projected as 10,000 hours.

At this point, some assumptions must be made and documented along with the calculations:

The component parts are of uniform reliability, even though we know there are differences.

The parts count is correct, although the design may not be complete.

The estimated failure rates for the four component parts are valid rates, even though we know they are just estimates.

The operating temperature on which we established the component failure rates is correct for our application.

Finally, we must define those two judgment issues we mentioned earlier: adequate operation and intended application. All this will make the prediction most meaningful and help fine-tune the prediction later if our assumptions prove to be flawed.

There are two benefits of predicting MTBF on a product. First, it may be a customer requirement, in which case the other benefit doesn’t count. Second, it can be done long before the design is committed to production, giving a heads-up evaluation of the product. It even highlights the weak points, so they can be improved at minimum expense.

Fortunately for the reliability specialist, software is available to simplify reliability calculations.³ The computer allows you to select stress levels such as operating voltage and temperature to simulate real-world conditions that the product will encounter.

With all this, don’t overrate the MTBF number. Sure, it is a very precise calculation, but it is based on estimates. They are the best estimates we can get, but there still is that lingering uncertainty.

Assessing Failure Rates Through Failure Reports

After products have been delivered and in service for a few months, reality sets in. Possibly the failure reports show a significantly lower or higher failure rate than your calculations predicted. If so, what happened? Does this mean that your calculation of MTBF is a flawed process? No. Or if the numbers match within a few points, does this mean that you don’t need to analyze field failure reports? Again, no. Both methods of failure analysis are important, and there are reasons for any significant differences.

Let’s go back to the definition of reliability and the two judgment questions:

What is the inadequate operation defined as a failure? Do the failure reports for this product reflect your definition of adequacy?

Was the application within the boundaries for which the product was designed? Was the power input within limits? What about the operating environment? Was the product subjected to electrostatic discharge (ESD) after delivery? Was it dropped or mishandled in some other way to cause the reported failure?

Another factor that causes variations in the reliability prediction is the inevitable difference in failure rates for component parts that look the same and come from the same manufacturer. Remember, those failure rates are based on averages, not absolutes.

Then you must consider your manufacturing practices. Did you make serial number 10,000 the same as serial number 10? Each of these items can modulate your failure rates.

What It All Means

The MTBF prediction, properly calculated and carefully footnoted with the assumptions on which it is based, is an excellent engineering tool. It evaluates your product even before the design is finished and highlights problem areas that should get further attention. It is a valuable asset to your potential customer, helping to evaluate suppliers and designs before a contract is awarded and assisting in logistics planning for support of the product in the field.

There also is a fiscal use of the MTBF calculation. You can predict the probable difference in the cost of warranty repairs if part B is substituted for part A and make value judgments with some confidence in their validity.

When the real-world failure reports begin to arrive and it seems like the MTBF estimates were wrong, that is a good time to look at those judgment questions again. Are the definitions of failure really valid? Is the product being used correctly? Don’t blame the estimates. When reliability is the issue, use every tool at your disposal. All of them are valuable.

References

MIL-HDBK-217F-2, Reliability Prediction of Electronic Equipment, Department of Defense.

Bellcore TR-332, Issue 6, Reliability Prediction Procedure for Electronic Equipment, Telcordia Technologies.

Software from T-Cubed Systems and Relex Software.

About the Author

David A. Case, NCE, is the senior compliance engineer responsible for worldwide product approval at Aironet Wireless Communications. He holds NARTE certifications in the fields of telecommunications, EMC, and ESD control; serves on the board of directors for NARTE; chairs the IEEE EMC Society Representative Advisory Committee; and is a member of the Executive Committee for the United States Council of EMC Labs (USCEL) and the Part 15 Coalition. Aironet Wireless Communications, 3875 Embassy Parkway, Akron, OH 44333, (330) 664-7396, e-mail: [email protected].

Table 1.

Part Quantity Failures/ 10⁶ Hours

1 1 0.50

2 1 0.30

3 1 0.15

4 1 0.05

__________________________________________

Total Failure Rate = 1.00 per 10⁶hours

MTBF Estimate 10⁶= 1,000,000 hours

1.00

May 2000