Safety in Medical Device Software: Questions and Answers

Software is playing an expanding role in modern medical devices, raising the question of how developers, regulators, medical professionals, and patients can be confident in the devices' reliability, safety, and security. Software- related errors in medical equipment have caused people's deaths in the past, so the issue is not simply theoretical. Device manufacturers need to provide safety assurance for complex software that is being developed in a competitive environment where price and time-to-market are critical factors. Further, security issues that previously were not a major concern now need to be anticipated and handled.

I recently spoke with Dr. Benjamin Brosgol, senior member of the technical staff at Adacore, about these issues. This includes how some recent Food and Drug Administration (FDA) regulations are dealing with such issues, and how programming language and support tool technology can help.

Wong: How does the medical device industry differ from other safety-critical industries, such as avionics, transportation and nuclear control?

Brosgol: Obviously there's a basic similarity in that a software failure can directly lead to human fatality, but there are also some differences between medical and other domains:

If a plane crashes or a nuclear reactor releases deadly radioactive gases, that's instant news all over the world. On the other hand, one person dying from an insulin overdose caused by faulty infusion pump software does not get the same attention. That's not to understate its significance, or the importance of having confidence that medical devices "first do no harm". But historically the requirements for certification in the medical industry have not been as strict as in domains such as commercial avionics.
If software problems delay a new aircraft's flight certification, then airlines can use older planes. This is perhaps an inconvenience for passengers, or an additional cost for the airline or the manufacturer, but the delay is not a safety hazard. On the other hand, if there is a delay in certifying a new life-saving medical device, then people can die while waiting for the device to be approved. That tends to put the onus on the FDA to show that a device is unsafe, versus on the manufacturer to show that it is safe.
There are only a few companies in the business of constructing airplanes or nuclear reactors, whereas there are several thousand medical device companies. The large number of vendors of medical devices affects the dynamics of regulation/approval.
In most safety-critical domains, equipment is operated only by trained / certified personnel. Medical devices often need to be operated by end users (patients) who have no formal training. That places a higher demand on the design of the user interface and the need for hardware- or software-enforced checks on inputs or outputs.
Many safety-critical industries see software lifecycles that extend over a decade or more, with the hardware platform remaining stable. In contrast, a medical device manufacturer tends to react more quickly to hardware improvements that will offer a competitive advantage in terms of performance or functionality. That in turn induces a need for more frequent software changes or upgrades, which can introduce various risks (regression errors, configuration / version problems, etc.).

Wong: Give some examples of medical device software safety issues that have been responsible for patients' deaths.

Brosgol: Some of the earliest and most dramatic incidents involved the Therac-25 radiation therapy machine in the mid 1980s, when a software design error caused several patients to be killed from radiation dosages orders of magnitude higher than what had been intended. The immediate cause was a so-called "race condition" where rapidly-entered human input took the machine into an unplanned state. But more fundamentally, a detailed accident analysis cited shortcomings in the development and verification process, including inadequate software engineering practices.

Unfortunately, that was not the only case where defective medical device control software has proved fatal. Here's an excerpt from Total Product Life Cycle: Infusion Pump - Premarket Notification \\[510(k)\\] Submissions (Draft Guidance, April 2010):

FDA has seen an increase in the number and severity of infusion pump recalls. Analyses of MDRs \\[Medical Device Reportings\\] have revealed device problems that appear to be a result of faulty design. Between January 1, 2005 and December 31, 2009, FDA received over 56,000 MDRs associated with the use of infusion pumps. Of these reports, approximately 1% were reported as deaths, 34% were reported as serious injuries, and 62% were reported as malfunctions.

This translates into a rate of more than 100 deaths and 3500 serious injuries per year, from equipment that is supposed to be helping people. The Draft Guidance goes on to state:

The most frequently reported infusion pump device problems are: software error messages, human factors (which include, but are not limited to, use error), broken components, battery failure, alarm failure, over infusion and under infusion. In some reports, the manufacturer was unable to determine or identify the problem and reported the problem as "unknown." Subsequent root cause analyses revealed that many of these design problems were foreseeable and, therefore, preventable.

The FDA has evaluated a broad spectrum of infusion pumps across manufacturers and has concluded there are numerous, systemic problems with device design, manufacturing, and adverse event reporting. FDA has structured this guidance document to address these device problems prior to clearance of the premarket notification and in the postmarket context.

Wong: How does the new FDA guidance intend to solve these problems, without making the certification effort unrealistically expensive?

Brosgol: The draft guidance requires the manufacturer to demonstrate safety through "assurance cases":

An assurance case is a formal method for demonstrating the validity of a claim by providing a convincing argument together with supporting evidence.

The "formal method" is not necessarily mathematical, it is more a line of reasoning as would be found in a court of law to prove a particular point.

An assurance case addressing safety is called a safety case. A top-level claim (e.g., "this infusion pump is comparably safe") is supported by arguments that demonstrate why and how the evidence (e.g., performance data) supports the top-level claim. The arguments in a safety case are typically organized in a hierarchical fashion with multiple layers of sub-claims, each supported by appropriate evidence. The arguments in a safety case are intended to convince a qualified reviewer or reviewers that the top-level claim is valid.

This approach is not new; it is also found in other safety standards such as the UK's Defense Standard 00-56. It is not intrinsically more or less expensive than other methods, but the key is to apply it early in the development process so that potential hazards can be avoided before they get into the product. Building an assurance case after the fact is possible but will require understanding or reverse engineering the requirements and the design, and that is a difficult and costly process.

Wong: What's the difference between process-based certification standards such as DO-178B, and product-based standards such as FDA infusion pump guidance with its safety case requirements?

Brosgol: Process-based certification focuses on the development-related activities and their associated artifacts (requirements specifications, test plans, configuration management procedures, etc.) rather than on measurable properties of the finished product. It also tends to emphasize testing, versus say formal methods, as the main way to verify that software meets its requirements.

The rationale for such an approach in DO-178B was perhaps best summarized by Gérard Ladier (Airbus) at the FISA-2003 Conference:

It is not feasible to assess the number or kinds of software errors, if any, that may remain after the completion of system design, development, and test. Since dependability cannot be guaranteed from an assessment of the software product, it is necessary to have assurance on its development process. You can't deliver clean water in a dirty pipe.

This technique has worked well in practice. Although there have been some close calls, there has never been a fatal aircraft accident that has been attributed to DO-178B-certified software.

But the process-based approach, with its indirect relationship with safety, is not without its critics. For example, John Rushby (SRI) offered this observation at the HCSS Aviation Safety Workshop in October 2006:

Because we cannot demonstrate how well we've done, we'll show how hard we've tried.

In contrast, the product-based approach concentrates on properties of the delivered product rather than on how it was developed. Its relationship with safety is therefore more direct than with the process-based approach. As mentioned above, it is illustrated in "safety case" certification standards, such as the UK's DEF-STAN 0056 and the FDA infusion pump draft guidance.

In point of fact, both approaches are needed. Indeed, DO-178B depicts the System Safety Assessment Process as separate from, but interrelated with, the Software Life Cycle Processes that the standard describes. And an organization that is developing a safety case analysis certainly needs a sound software management process to ensure appropriate configuration management, traceability between requirements and implementation, bug tracking, regression testing, etc.

Wong: What about security?

Brosgol: Safety-critical systems need to be accessed by external equipment for various reasons, and for many medical devices such remote access is intrinsic (e.g., pacemaker software updates). The connectivity raises obvious security questions; how to ensure that the equipment, the software, and the data are protected from threats to the needed confidentiality, integrity, and availability.

As with safety, this is an issue that needs to be considered from the earliest stages of software development. In a sense, it could be treated as a special category of safety (in order to be safe, a system has to be secure) but security issues have some inherent differences from safety issues. For example, with safety one may take a probabilistic approach to device failure based on hardware characteristics, but with security one must assume that an adversary who knows about a vulnerability will exploit it.

Security issues are not dealt with directly by safety regulations, such as DO-178B or the FDA Draft Guidance, but several other standards are relevant. For example, the Common Criteria, which is basically a catalog of security-related functions and assurance methods, can serve as a reference for specific measures that may be needed (e.g. data encryption, user authorization) based on the device characteristics. And the Information System Risk Management Framework (ISO/IEC 27005) can likewise be adapted so that it applies to the development of medical device software.

Wong: What is the role of the programming language in helping produce safe software?

Brosgol: Safety certification standards, whether process- or product-based, tend to be language blind, but that does not mean that the choice of languages is insignificant. Languages differ with respect to their susceptibility to vulnerabilities, and the language choice affects the ease or difficulty of achieving safety certification.

For use in developing safety-critical systems, an ideal programming language should meet three main criteria:

Reliability. The language should help detect errors early, and should be immune from "traps and pitfalls" that can lead to program bugs.
Predictability. The language should not have features whose effect is unspecified or implementation dependent. Such features interfere with portability (for example the program might have a different effect when compiled with a different compiler) and may introduce security vulnerabilities.
Analyzability. The language should enable automated static analysis that can derive safety- or security-related properties, such as absence of "buffer overflow".

Unfortunately, no general-purpose programming language achieves this ideal. This is not a surprise, since languages such as C, C++, Ada, and Java were designed with tradeoffs among a number of goals. The real question is whether it's possible to define a subset that is sufficiently expressive to be practical for real-world systems while meeting the three criteria mentioned above.

A number of candidate subsets are available, including:

MISRA C, a C subset intended for safety-critical software. The original focus was on automotive systems, but the subset is not domain specific.
MISRA C++, a C++ subset designed for critical software, not necessarily safety critical.
SPARK, an Ada subset augmented with special comments that reflect program "contracts". SPARK tools can verify various safety- or security-related properties.

Each of these subsets has pros and cons. Since C and C++ are widely used in the software industry in general, there is a large pool of potential users, and the MISRA subsets eliminate many vulnerabilities found in the full languages. But the MISRA restrictions are sometimes subject to interpretation, and in any event neither C nor C++ were originally designed with safety or security as a goal.

SPARK is reliable, preventing errors such as buffer overrun, and predictable, with completely-defined semantics. It has a proven track record in both safety and security, and is highly analyzable, due to the language restrictions and to Ada's ability to specify subranges on scalar data. SPARK's main drawback is its relatively small user / tool provider community.

In addition to these existing subsets of C, C++, and Ada, there is also an on-going effort to define a safety-critical subset of Java. This is intended for applications that need to be certified at the highest levels and may be attractive to a developer who has settled on Java for other parts of the system. However, the safety-critical Java effort is not yet complete, and it remains to be seen whether its approach to memory management without automatic storage reclamation is practical.

In summary, developers of safety-critical systems need to consider various tradeoffs. A language technology such as SPARK offers the most technical advantages, while options, such as MISRA C and MISRA C++, have larger user and vendor communities.

Wong: What about static analysis tools?

Brosgol: Static analysis tools are sometimes portrayed as the way to deal with safety and, more especially, security. Feed the source code into the tool, have it detect potential vulnerabilities, and then make the appropriate repair. The reality, however, is different, for several reasons.

The best way to deal with errors is to prevent them from being introduced in the first place. So in a sense, a static analyzer that detects, say, a potential buffer overflow, is coming in too late. It's preferable (and less expensive) if the error is detected at the outset, before the bug settles into the delivered code. And here the language makes a difference, as was pointed out in the answer to the previous question. A language with strong type checking will prevent a pointer from being treated as an integer. A language with run-time checking will throw an exception if an array index is out of range.
Unfortunately, one does not always have the opportunity to deal with security issues at the outset. For example, an existing system might need to be deployed in a networked environment and thus has to be analyzed for potential vulnerabilities. A "retrospective" static analysis tool is the typical approach, but again, the programming language makes a difference. For example, a number of vulnerabilities in C or C++ could not arise in Ada or Java. There are also several practicalities that need to be taken into account:
- Soundness: Does the tool report all errors in the class of bugs that it is looking for?
- Precision: If the tool reports an error, is it a real error as opposed to a "false alarm"?
- Depth: Does the tool handle "interesting" error categories, e.g. ones that require sophisticated data or control flow analysis?
- Efficiency: Does the tool scale up to large systems, perhaps hundreds of thousands of lines of code, or more?

There are tradeoffs among these, especially between soundness and precision. If a tool is not sound then it must be complemented with some other analysis in order to ensure that all errors are detected. If a tool is not precise, then developers can easily overlook real errors given the large number of false alarms. And a tool that detects only trivial properties, or that is inefficient, is not useful.

Static analysis tools have their place. They are most useful during the software development process, so they can help prevent bugs from being introduced. When used "retrospectively" they can certainly help, but the tradeoffs among the various goals need to be carefully weighed.