Premium Content

New Signal Chain Resources from Texas Instruments:

The Limits of Testing in Safe Systems

Date Posted: November 11, 2011 01:10 PM
Author: Chris Hobbs

Traditionally, proofs that software systems meet safety standards have depended on exhaustive testing. This method is adequate for relatively simple, deterministic systems with single-threaded, run-to-completion processes. Unfortunately, testing is no longer adequate to ensure the dependability of today's multi-threaded systems. Though these systems are deterministic in theory, their complexity forbids our treating them as deterministic systems in practice.

From fault to failure

When we build a safe system, we must begin with the premise that all software contains faults and these faults may ultimately lead to failures.

Failures are the result of a chain of circumstances that start with a fault introduced into a design or implementation. Faults may lead to errors, and errors may lead to failures, though, fortunately, many faults never lead to errors, and many errors never cause failures. Table 1 describes faults, errors and failures.

Table 1: Faults, errors and failures
Fault A mistake in the design or code. An example might be a design of a protocol that permits a deadlock to occur. Within code, specifying an array as int x\[100] instead of int x\[10] would be a fault, although it is unlikely to lead to an error.
Error Unspecified behavior caused by a fault in the design or code. A fault in the design of the protocol might result in a deadlock during execution, and the recovery might cause a message that was in transit to be lost.
Failure A failure to satisfy one of the safety claims about the system, due to an uncontained error. This loss of a message (an error) might be harmless, or it might become the direct cause of a hazardous situation.

Figure 1, adapted from James Reason's Human Error (James Reason, Human Error. Cambridge UP, 1990), illustrates how faults at different points in the development cycle can eventually lead to a failure. We have subdivided the defenses to match the two layers in software: pre- and post-shipment defenses (with attendant holes). Pre-shipment defenses are those validation and verification activities carried out before deployment, while post-shipment defenses are defenses built into the system itself and activated to protect it during use. The causes of every failure can be traced back-at least in theory-to a lacuna at each stage.

When we build a safe system, we cannot prove that the system contains no faults (see below Inherent limitations of testing). What we can do, though, is demonstrate that errors in the system will not cause the system to fail more often or for longer than the limits we claim. Or, to put it more directly, we can provide evidence to support our claims that our system will be as dependable as we say it is.

non-deterministic systems | QNX | safety critical systems
Part Inventory
Go
powered by:
 

 
You must log on before posting a comment.

Are you a new visitor? Register Here
    There are no comments to display. Be the first one!