Clear SOUP And COTS Software Can Reliably Serve Safety-Critical Systems

All designers who build complex software systems face the same challenges: time, quality, size, and cost. A change to one element affects the others.

Chris Hobbs

March 21, 2012

15 min read

Add Us On Google

All designers who build complex software systems face the same challenges: time, quality, size (number and complexity of features), and cost. Engineering is a series of compromises between feature sets, delivery dates, and quality. A change to one element affects the others. For example, a shorter delivery schedule means discarded features, lower quality, or both. The total cost of developing and delivering a product is a function of these compromises (Fig. 1).

1. The cost of developing and delivering a software system is a function of the compromises we make between time, size, and quality.

If we’re building a system that must meet safety requirements, we must add the work needed to balance safety against availability, to prepare the safety case and attendant validation, and, ultimately, to secure approval and/or certification by various local, national, and supra-national regulatory bodies.

“Roll-Your-Own” Or Ready-Made SOUP?

Essentially, there are two choices for our software development strategy. First, we could design and build everything ourselves (“roll-your-own”) from the operating system (OS) or even the board support package (BSP) up.1 Or, we could use available software components where possible, integrating them with our own components into our system.

Roll-your-own may be a viable choice for a simple system, one with functionality so limited that it doesn’t need full OS capabilities. For a more complex system, attempting to do everything ourselves will take more time and entail far more risk than building a system with carefully selected external components.

These components—the OS, communications stacks, middleware (for example, meeting the Service Availability Forum standards), and databases—require specialized knowledge. Products from the open-source communities and commercial off-the-shelf (COTS) software vendors can both reduce cost and reduce risk.

Unfortunately, though, such code must be considered SOUP—software of uncertain provenance or pedigree. Superficially, this strategy may, therefore, not appear to be an ideal choice for a system where safety is a consideration.

But before we return to our first option (building everything ourselves), we should note that many safety standards assume SOUP will be an integral part of the system and provide guidance on how it should be treated. For example, the IEC 62304 standard for “Medical device software—life cycle processes” offers two definitions of SOUP, which can be either (or both) “software not developed for the medical device in question, or software with unavailable or inadequate records of its development processes.”2

2. A very simple fault tree numbers failures (1,2, etc.) and uses letters to identify leaves (A, B, etc.).

The more general standard, IEC 61508, also anticipates the use of SOUP, specifying that a pre-existing software element is a:

“software element which already exists and has not been developed specifically for the current project or safety-related system [...] The pre-existing software could be a commercially available product, or it could have been developed by some organisation for a previous product or system. Pre-existing software may or may not have been developed in accordance with the requirements of this standard.”3

We should also note that fault histories and proven-in-use data are an important part of every safety case. If we build everything ourselves, we will have a system with not one minute of in-field usage for even one component. In short, we may well build a safer system with SOUP than without.

The question, then, is not whether it’s permissible to use COTS software or SOUP in our system, but how to decide whether a particular COTS software or SOUP item is appropriate for our system and how to validate that this COTS or SOUP item supports the safety requirements for our system. To answer this question we must add some precision to our definition of SOUP and to the relationship between SOUP and COTS software.

SOUP And Clear SOUP

Some software vendors make the incorrect distinction between COTS and SOUP. COTS, they say, has a vendor standing behind it, a company that has staked its reputation on the software functioning as specified, while no one stands behind SOUP.

This distinction is valid in the same way that it may be preferable to buy medication from a reputable pharmacy rather than from some Web site that uses spam to advertise. However, it is also largely irrelevant, since for us most COTS software is quite likely also SOUP. The processes the vendor followed (or failed to follow!), source code, fault histories, and indeed everything else we would have available if we were developing the product ourselves may not be available to us or anyone else outside the vendor’s organization.

A better distinction is between opaque SOUP and clear SOUP. This distinction is not based on any commercial criteria (commercial or not commercial). Rather, it’s founded in the artifacts available to support claims about the risks and safety levels of the systems built with the SOUP. IEC 61508, for example, says:

“In order to assess the safety integrity of the new system incorporating the pre-existing software, a body of verification evidence is needed to determine the behaviour of the preexisting element. This may be derived (1) from the element supplier’s own documentation and records of the development process of the element, or (2) it may be created or supplemented by additional qualification activities undertaken by the developer of the new safety related system, or by third parties. [....] In any case, a Compliant Element Safety Manual must exist (or must be created) that is adequate to make possible an assessment of the integrity of a specific Safety Function that depends wholly or partly on the reused element.”4

A COTS OS from a commercial vendor may have a well-documented development process. The vendor presumably adheres to this process and possesses the source code, which its engineers can readily examine, and it has tracked and documented the software’s failure history.

But if this information isn’t available for public scrutiny, and if, as required by IEC 61508, the product is not accompanied by a safety manual defining which features must be avoided to ensure dependable operation, we must consider the OS opaque SOUP.5 Before we can use it, we must at least analyze it sufficiently to produce a safety manual—not an insignificant undertaking.

Open-Source Software

In contrast to proprietary software, open-source projects such as Apache and Linux make their source code and fault histories freely available. Thanks to years of active service, this software’s characteristics are well known. Although it is “of uncertain provenance,” by the definition in IEC 62304, we can scrutinize this software with static code analysis tools (ranging from simple syntax checkers to sophisticated symbolic execution) and execute the tests provided by the open-source developers and perform path coverage analysis. The software’s long history makes findings from statistical analysis particularly relevant.

We can consider software developed in these open-source projects to be clear SOUP—that is, SOUP that we can examine, verify, and validate as though we had written it ourselves. Despite these attractive characteristics, though, open-source software may not be the best solution for safe software systems. The difficulty with using such software in safe systems is that the processes for open-source development are neither clearly defined nor well documented. And, of course, most open-source projects don’t create safety manuals, which leaves that work with us.

In short, we can’t know how the software was designed, coded, or verified, and validating safety claims without this knowledge is an improbable endeavor. Also, SOUP or COTS software may include more functionality than is needed, which leaves dead code in the system, a practice that standards such as IEC 61508 expressly discourage.

Recipes For Clear SOUP

If a COTS software vendor makes available its product’s source code and fault history, it clarifies its SOUP. Some vendors choose to go one better and provide a clear recipe for the SOUP. They release to their customers the detailed processes they use to build their software, along with its complete development history—an informal audit trail that we can use to help substantiate claims about the software’s reliability and availability. Some vendors may even make available for scrutiny the evidence they presented to obtain certification (e.g., IEC 61508 SIL3) for their product.

The COTS software recipe for clear SOUP (documented development and validation artifacts, histories, and documented processes) is not just necessary for the initial safety case we must put together for our safe system. It can also prove invaluable for subsequent validation following product upgrades. It is worth noting that:

“In a study the FDA conducted between 1992 and 1998, 242 out of 3,140 device recalls (7.7 percent) were found to be due to faulty software. Of these, 192—almost 80 percent—were caused by defects introduced during software maintenance.”6

In other words, the faults were introduced after the devices had gone to market. The software in the devices worked, and then someone either broke it or uncovered previously undiscovered faults. Ideally, then, when developing, maintaining, and upgrading safe software systems, we should work with clear SOUP made with a clear recipe that has a long and well-documented history of success in the field.

Choosing COTS Software

Since it is likely that we will use COTS software in some part of our system, we need to know what to look for when deciding which software we can use and which software we should avoid. At the highest level we must ask what evidence the COTS software vendor provides that its software will support the safety case we must build for our system.

If we assume that all COTS software is SOUP, then we must find out what sort of SOUP it is. If the software doesn’t come with documented evidence that it was designed, built, and validated following rigorous development processes and lacks proven-in-use data and fault histories, we would best look elsewhere. Without this information, we will have to do everything ourselves from scratch at considerable effort and expense.

But if the COTS software is clear SOUP—that is, if it comes with adequate records of development processes, proven-in-use data, and fault histories—it may be appropriate for use in our safe system and reduce our development costs in the bargain. High-level checklists can help us determine if COTS software is clear SOUP and a good candidate for integration into a safe software system.

Functional Safety Claims

We begin by examining the functional safety claims the vendor makes about the software:

Does the vendor make functional safety claims?
Do these claims meet the functional safety requirements for our project?
Are the context and limits of the claims specified? For instance, are these claims for continuous operation or for on-demand operation?
Do the COTS software functional safety claims specify the probability of dangerous failure? Or, inversely, what claims does the vendor make about the software’s dependability?
Does the vendor define “sufficient dependability,” and how does it quantify its dependability claims? For example, is the quantification of the (essentially meaningless) “five-nines” type, or does it provide meaningful information about availability and reliability in relevant contexts?
Does the vendor quantify the COTS software claims of availability (how often the system responds to events in a timely manner) and reliability (how often these responses are correct)?

Process

A defined and documented process covering the entire software lifecycle is required:

Has the COTS software vendor implemented a quality management system (QMS)?
Does this system meet the requirements of the ISO 9000 family of QMS standards, ISO 15504 (Software Process Improvement Capability Determination, or Spice), or Capability Maturity Model Integration (CMMI)?
What processes does the vendor use for source control, including revisions and versions?
How does the vendor document, track, and resolve defects, including those found through verification and validation and in the field?
How does the vendor control updates?
Does the vendor classify defects and follow up with fault analysis?

Fault-Tree Analysis

Fault-tree analysis, using a method such as Bayesian belief networks, is an essential tool both for discovering and resolving design errors and for estimating system dependability:

Was the COTS software evaluated with fault-tree analysis?
Did the analysis use both a priori (cause to effect) and a posteriori (effect to cause) evidence?
Are the results of the fault-tree analysis available to us?

Static Analysis

Static analysis is invaluable for locating suspect code, and agencies such as the United States Food and Drug Administration have recommended its use:7

Does the COTS software vendor use static analysis to identify potential problems in its product?
Does the vendor use static analysis techniques such as syntax checking against published coding standards, fault probability estimates, correctness proofs (e.g., assertions in the code), and symbolic execution (static analysis-hybrid)?
What artifacts does the COTS software vendor provide to support the findings from its static analyses?

Proven-In-Use Data

Proven-in-use data is invaluable both when reviewing COTS software dependability claims and for building claims. IEC 61508-7, for instance, provides proven-in-use values required for SOUP safety integrity levels (e.g., 4.6 x 108 fault-free hours to have 99% confidence that SOUP is SIL 3). Anyone building a system for which proof of dependability may one day be required should gather in-use data as a matter of course:

Can the COTS software vendor provide proven-in-use data?
How far back does the data go?
How comprehensive is the data? For example, what is the sample size for which data is available? Does this sample represent a small or large percentage of the vendor’s runtimes? How does the vendor gather this data?
Does the vendor provide fault analysis results with the proven-in-use data, or just usage data?

Design Artifacts

Design and validation artifacts are one of the key differences between SOUP and clear SOUP. If the COTS software vendor cannot provide an extensive set of artifacts, there is little reason to select its wares over open-source software. Before choosing a vendor, ask:

What design artifacts does the COTS software vendor provide with its software?
Does the vendor provide architectural design documents or detailed design documents?
What are the test plans and methods for the COTS software, and does the vendor publish the detailed results?
What other validation methods does the COTS software vendor use (see above), and are the methods and detailed results available?
Does the vendor maintain and make available a traceability matrix, from requirements to delivery, and is it available for scrutiny?
What records does the vendor keep of the software life cycle, including changes as well as issues and their resolutions?

The Safety Manual

The safety manual is another key requirement. If the COTS software doesn’t include a safety manual, try another vendor. If there is a manual, it should be evaluated:

Does the safety manual state the functional safety claims for the COTS software?
Does the safety manual define the context and constraints for the COTS software functional safety claims? These should include the environment and the usage where the functional safety claims are valid. For example:
- “This list of processor architectures is exhaustive.”
- “Floating point operations SHALL NOT be performed in a signal handler.”
- “Critical budgets are limited to the window size.”
Does the vendor provide training on the safe application of the product?

Certified Components

Even if all of these recommendations have been followed and the COTS software meets all the requirements for clear SOUP, there are no guarantees that approval or certification of the final product will proceed according to plan and on schedule. Further advantage can be gained from working with a COTS software vendor that has experience with approvals and from employing components that have received relevant approvals.

Components that have received external and independent certifications such as IEC 61508 can streamline the approval process and reduce time-to-market. First, these components must be developed in an environment with appropriate processes and quality management to be certified. Second, these components require the proper testing and validation, and the COTS software vendor will have all the necessary artifacts, which will in turn support the approval case for the final device. Finally, a vendor that has experience with certifications will be able to offer invaluable advice and support to its customers.

Conclusion

Neither key standards, such as IEC 62304, ISO 26262, and IEC 61508, nor the demands of functional safety preclude the use of COTS software in safety-critical devices. We must exercise diligence and caution, but COTS software may be a perfectly acceptable choice, given stringent selection criteria and appropriate and equally stringent validation of the completed medical systems and devices.

In fact, if we make the critical distinction between opaque SOUP (which should be avoided) and clear SOUP—that is, SOUP for which source code, fault histories, and long in-use histories are available—we will find that COTS software may be the optimal choice for many safety-related software systems.

References

Board support package: board-specific software for a specific OS, containing the minimal device support required to load the OS, including a bootloader and drivers for the devices on the board.
IEC 62304, 5.1.1.
IEC 61508-4, 3.2.8.
IEC 61508-7 (ed. 2.0) C.2.10.2.
Here, the analogy with a medication bought from a reputable pharmacy or through a spamming Web site falls apart. The medication sold by the pharmacy is not like the opaque software: every ingredient (including the “inactive” ones), every process used to create or extract these ingredients, and the finished medication must be available for regulatory scrutiny. This is why pharmaceutical and biotechnology companies rely so heavily on their patents. They may not keep trade secrets, so patents are their only protection.
Jackson, Daniel et al., eds., Software for Dependable Systems: Sufficient Evidence? Washington: National Academies Press, 2007, p. 23.
FDA, Research Project: Static Analysis of Medical Device Software, updated Feb. 11, 2011.

About the Author

Chris Hobbs

Chris Hobbs is an OS kernel developer at QNX Software Systems Limited, specializing in "sufficiently-available" software: software created with the minimum development effort to meet the availability and reliability needs of the customer; and in producing safe software (in conformance with IEC 61508 SIL3). He is also a specialist in WBEM/CIM device, network and service management, and the author of A Practical Approach to WBEM/CIM Management (2004).

In addition to his software development work, Chris is a flying instructor, a singer with a particular interest in Schubert Lieder, and the author of several books, including Learning to Fly in Canada (2000) and The Largest Number Smaller than Five (2007). His blog, Software Musings, focuses "primarily on software and analytical philosophy".

Chris Hobbs earned a B.Sc., Honours in Pure Mathematics and Mathematical Philosophy at the University of London's Queen Mary and Westfield College.