Object Orientation in Embedded Critical Systems – don't be scared anymore!

Object Oriented Programming (OOP) was introduced to software development almost half a century ago, and its popularity has reached all fields of software development. All? Well almost. OOP was banished from embedded, safety-critical development for this entire time, so many of the latest design methodologies often based on these patterns became wishful thinking.

History is in motion and even the most conservative industries such as avionics are now opening the door to OOP. In this article, we'll discuss the rationale for the fear to adopt this programming paradigm and the techniques to mitigate the risks.

Fear of the Object

Lots of things have been said about object oriented programming vulnerabilities, and a lot of fear has been spread. In part this is due to the confusion about what object oriented programming actually means. People seem to lump a variety of concepts in the same basket, including templates, exceptions, overloading, inheritance, encapsulation, polymorphism, garbage collection, and dynamic memory allocation. Here we will concentrate on two concepts at the core of the OO paradigm – inheritance and polymorphism, leaving other issues for future discussions.

In summary, object oriented programming is about the relationships between types, and in particular the inheritance relationship, which is sometimes referred to as the "is a" relationship (i.e., specialization). Let's take a simple example: consider an avionics system in which various sensors have to be managed with a set of properties (operations, data fields). Some are common to all, and some are specific to particular kinds of sensors. You may want to model each kind of sensor with a dedicated type, "Temperature_Sensor", "Speed_Sensor", "Pressure_Sensor" etc. Going further, you may want to have more specialized sensors, from different providers, e.g. "Company_1_Temperature_Sensor", "Company_2_Temperature_Sensor", etc. All of these types will provide common operations (functionalities) for initializing, checking the error status, retrieving the value, establishing a name, and will also have some specific operations, e.g. for drivers or calibration services.

In an object oriented design, common functionalities will often be declared in a higher level abstraction, a "Sensor" type in our example. We can then introduce hierarchical dependencies through inheritance; in particular, "Provider_1_Temperature_Sensor" is a "Temperature_Sensor" which itself is a "Sensor".

With the inheritance relationship, a child has all the properties of its parents, and perhaps some additional ones. In other words, a child type is characterized by all the data (also known are fields, components or attributes) and all the services (also known as operations, methods, subprograms, or functions) of its parent, plus potentially its own.

Additionally, a child is allowed to redefine any of the services of its parents. In our example, let's assume that all sensors need to provide a service "Calibrate", putting the sensor in a state where the values it sends are meaningful. It's most likely that different sensors will implement this differently. Parent types may implement some default calibration schemes, or even do nothing, while child types will implement the complete procedure. In other words, the child will replace or override the parent behavior for this service.

So far so good. What we've described above is only the inheritance link. Now the issue is that a particular object (instance) might not necessarily be manipulated through a variable of the instance type, but rather from a variable of its parent type. In other words, while the variable in use may be of type "Sensor", the actual object manipulated could be from any of its children, like "Temperature_Sensor" or "Speed_Sensor". This is extremely convenient when developing algorithms for high level services. For example and given the above, it will be very easy to iterate over a list of anonymous sensors, calibrating them all. This could be a done before take-off for example. However, from a safety point of view this raises a red flag: when calling "Calibrate" one has no idea which Calibrate procedure will actually be called. The call is "dispatching" at run time to the appropriate service, determined by the type of the actual object being manipulated, not the type of the variable. How can we verify that it behaves as expected in the context of the call? How can we verify that the resources that are used are consistent with the requirements (for example stack usage and worst case execution time)? How can we know that the dispatching call is sufficiently tested?

Pessimistic and Optimistic Testing

A key issue with testing is to define when enough has been performed. Standards such as DO-178B define one criterion: structural code coverage. There are various subtleties depending on the level of criticality of the software, but for the purpose of this discussion let's just consider cases where coverage means that each statement of the application is covered.

The issue with this criterion in the context of object orientation is that the dispatching call hides complexity in the application. Let's get back to our previous example. The following code iterates over a list of sensors for calibration. In addition, it will check the status of errors after calibration before letting the plane take off. This is written in Ada 2012, but could be just as well C++, Java, Python or any OOP language:

for P : Sensor'Class of Sensor_List loop
   P.Calibrate;
end loop;

if Warning_Is_Red then
   Lock_Plane;
else
   Signal_Plane_Can_Take_Off;
end if;

This loop iterates over all the sensors in the Sensor_List container; P, the sensor retrieved at each iteration, is of type Sensor'Class, meaning that it can contain any child of the sensor type. Elsewhere in the system, each of the child types will have its own implementations of Calibrate – either inherited from Sensor or else an overriding version doing a specific job for that sensor type:

procedure Calibrate (This : Sensor) is
begin
   ...-- some default calibration
end Calibrate;
...
overriding
procedure Calibrate (This : Speed_Sensor) is
begin
   ...-- some calibration specific to speed sensors
end Calibrate;
...
overriding
procedure Calibrate (This : Temperature_Sensor) is
begin
   ...-- some calibration specific to temperature sensors
end Calibrate;
...
overriding
procedure Calibrate (This : Pressure_Sensor) is
begin
   ...-- some calibration specific to pressure sensors
end Calibrate;
...

The selection of the subprogram to call will be done dynamically depending on the type of the object – therefore there is a hidden condition at the point of the dispatching call. The issue is whether evaluating this condition should be considered as occurring after or before the call. As we'll see, this has some rather important consequences in terms of code coverage. Let's start with the first case. For simplicity, assume that the Sensor is not an object in the OOP sense anymore, but rather a composite data structure (record or struct) that contains one field specifying its kind. If the condition is considered after the call, the equivalent non-OOP code would be:

for P : Sensor'Class of Sensor_List loop
   P.Calibrate;
end loop;

procedure Calibrate (This : Sensor) is
begin
  case This.Kind is
     when Sensor =>
        ...--some default calibration
     when Speed_Sensor => 
        ...--some calibration specific to speed sensors
     when Temperature_Sensor =>
        ...--some calibration specific to temperature sensors
     when Altitude_Sensor =>
        ...--some calibration specific to altitude sensors
   end case;
end Calibrate;

To achieve structural (statement) coverage here, covering Calibrate once with all the possible cases is enough. There are 4 different tests to write. These tests can be either unit tests directly calling Calibrate or going through the loop. This is called optimistic testing – you'll see why as we describe the second alternative:

for P : Sensor'Class of Sensor_List loop
   case This.Kind is
     when Sensor =>
        P.Calibrate_Sensor;
     when Speed_Sensor => 
        P.Calibrate_Speed_Sensor;
     when Temperature_Sensor =>
        P.Calibrate_Temperature_Sensor;
     when Altitude_Sensor =>
        P.Calibrate_Altitude_Sensor
   end case;
end loop;
...
procedure Calibrate_Sensor (This : Sensor) is
begin
   ...-- some default calibration
end Calibrate_Sensor;
...
procedure Calibrate_Speed_Sensor (This : Sensor) is
begin
   ...-- some calibration specific to speed sensors
end Calibrate_Speed_Sensor;
...
procedure Calibrate_Temperature_Sensor (This : Sensor) is
begin
   ...-- some calibration specific to temperature sensors
end Calibrate_Temperature_Sensor;
...
procedure Calibrate_Pressure_Sensor (This : Sensor) is
begin
   ...-- some calibration specific to pressure sensors
end Calibrate_Pressure_Sensor;
...

At first sight, this looks the same. We will still need 4 tests to cover the above. However, these 4 tests will have to be performed anywhere a dispatching call occurs! This means that if another subprogram is calling Calibrate, 4 additional tests would be needed! This is therefore called pessimistic testing, as it requires many more tests.

In summary, considering a type hierarchy of N types on a service called C times, optimistic testing will require writing (at least) N tests. Pessimistic testing will require us to write N * C tests. One can easily see how the later can inflate costs to an unacceptable level. Unfortunately, optimistic testing is not considered sufficient from a safety point of view.

Although applications with a reasonably small class hierarchy and a reasonably small number of dispatching calls can be successfully verified using the pessimistic testing method, these constraints are one of the impediments to widespread adoption of OOP paradigms in safety critical development. Fortunately, research has provided some interesting alternatives to improve things.

The Liskov Substitution Principle

Let's step back from testing and look at a higher design view, namely requirements. In most (and that should really be all) critical software, code is developed according to requirements. It should be possible to express a service's requirements in terms of what is accepted as input and how the system is modified after the call. These two sets of constraints, or contracts, are typically referred to as pre-conditions (since the condition has to be true before a call) and post-conditions (since the condition has to be true after the service completes).

For example, consider a Check_Availability service for a sensor. A reasonable pre-condition for this service is that the sensor has to be initialized. A reasonable post-condition is that the sensor has a measure to provide if it is defined as being available. These contracts can of course be arbitrarily complex and may require development of additional abstractions, but let's keep simple examples for the purpose of the demonstration.

At the point of the dispatching call, all that is known is the type of the parent. Therefore, the only requirements (or pre- and post-conditions) known for the service are those of the parent. The call written by the user can only rely on them. Problems arise when the overridden service in the child changes the contract – is this still compatible with the ones that are visible?

Generally speaking, it is OK to change the contract, it's even expected. A child is allowed to do additional things besides what the parent does. There are, however, cases where the changes introduce dangerous inconsistencies. Let's take a pre-condition. In our case, we're requiring the sensor to be initialized before undertaking a calibration. The user should, in good faith, make sure that all sensors are properly initialized and then expect subsequent calibrations to be working properly. But what if the pressure sensor has an additional requirement stating that the aircraft doors should be closed before the sensor is calibrated? This may look naive at first sight but one never knows what strange dependency – or error – can arise in software design. That's the whole point. When writing the code calling the sensors, the developer may not know which sensors types can actually be used. Their code and requirements may not even be written yet. So it's possible that the calibration service will be invoked before the doors are closed. This violates the precondition for the pressure sensor, and will put the software into a failure state. Such errors need to be prevented.

This illustrates the first part of the Liskov Substitution Principle. An overridden service cannot impose pre-conditions that are more restrictive than the parent. (Phrased somewhat more precisely, the precondition for the child type's service must be the same as, or weaker than, the precondition for the parent type's service.) The reverse is not a problem though. If a simple sensor doesn't require initialization, then initialization won't harm it in anyway and it will be fine to do this useless step.

Let's now consider the post-condition. After the calls on Calibrate, the user expects that if the result is true, then values are ready to be retrieved and the system is fine, unless the warning LED is red. That's why we're doing a test in the initial piece of code before letting the plane take off. Let's now consider our foolish implementer of the class Pressure_Sensor and assume that in case of a problem a message is displayed on the monitor. So instead of having "the system is OK or the warning LED is red" as a post condition, we would have "the system is OK or the monitor is displaying an error message". Not quite the same. Unfortunately, while writing the code, the caller of the calibration had no idea it was possible to send an error message to the monitor, so won't be checking for it and may let a plane take off with uncalibrated sensors. In this example, the inconsistency is introduced by an overridden service that doesn't fulfill the entire parent post condition.

This is the second part of the Liskov Substitution Principle. An overridden service must satisfy at least the post-condition of its parent; i.e., it must be the same as, or stronger than, the precondition for the parent. Thus it can provide more guarantees e.g. the fact that the pressure sensor displays a message wouldn't be a problem if it were to set the warning LED to red as well.

A hierarchy that satisfies the two principles above is considered to be "substitutable". This means that at each point of the program, parents' objects are substitutable by children objects, expecting no more constraints before the call and providing no fewer guarantees after it. This principle is the foundation of OOP usage in safety critical software.

Note that mention of type invariants and exceptions is deliberately omitted from the above, as they are can be considered as special cases of post-conditions.

DO-178C and the Local Substitution Principle

When the DO-178B standard was being revised, one of the largest challenges for the standardization committee was how to allow safety critical developers to take advantage of modern software technologies. Object Oriented Technologies was on the top of the list. The eventual decision was to allow, but not to require, pessimistic testing to demonstrate proper levels of verification for OOP code. New guidance, based on Liskov Substitution Principle, was created: the Local Substitution Principle.

In DO-178C wording, the Liskov Substitution Principle is called Global Substitution. Requiring demonstration of this property was considered overly restrictive, so instead the notion of Local Substitution was introduced and is now required by the standard. The difference between the two is that Global Substitution has to be checked for all the classes of the hierarchy, whereas Local Substitution needs only to be checked in the context of the actual dispatching call.

Going back to the Sensor example, demonstrating global consistency means demonstrating that all the classes of the hierarchy are substitutable for all their services. So Calibrate substitutability has to be checked not only between Sensor and Temperature_Sensor, but also between Temperature_Sensor and Provider_1_Temperature_Sensor, Sensor and Provider_1_Temperature_Sensor, Temperature_Sensor and Provider_2_Temperature_Sensor, etc. And this for all the services of the class. Depending on the verification strategy that is applied, this may be too large an effort, especially if the program does not have a dispatching call for a particular service.

Local Substitution means that the only substitutions to be considered are ones that actually occur in the program, namely on the dispatching calls. This can be visualized as the following constructive procedure, phrased in general OO terms (an actual algorithm could be more efficient): For each class T and each dispatching operation Op for T:

For each dispatching call X.Op, where X is a polymorphic variable of type T, calculate the possible set of actual types for objects that X can reference, based on the program logic.
For each pair of types T1 and T2 in this set such that T2 is a direct or indirect descendant of T1, verify that LSP holds for T1 and T2 (i.e. the precondition for T2.Op is the same as or weaker than the precondition for T1.Op, and analogously the postcondition for T2.Op is the same as or stronger than the postcondition for T1.Op).

If the program logic is sufficiently simple, then the set of possible object types that X can reference will be considerably smaller than the full set of types in T's class hierarchy. Further, if the types that violate LSP do not occur in this set, then the violation will not matter. To make this more concrete, let's return to our original example and suppose that there are three possible types in our hierarchy: Sensor, Temperature_Sensor and Provider_1_Temperature_Sensor. Temperature_Sensor will have an additional requirement on Calibrate's postcondition, for example the fact that a calibrated signal has to be turned to green upon suscessful calibration. This is OK; a child can strenghten a parent's post-condition. But suppose that Provider_1_Temperature_Sensor.Calibrate does not implement that behavior, and thus does not fulfill the entire contract of its parent. Is this hierarchy globally substitutable? Obviously not, because Calibrate in Provider_1_Temperature_Sensor has a postcondition that is less restrictive than Calibrate in Temperature_Sensor. However, Calibrate in Provider_1_Temperature_Sensor does fulfill the contract of its grandparent type Sensor.

Now suppose the only invocation of Calibrate in the program is in the following program fragment:

The only possible object types in the dispatching call P.Calibrate are Sensor and Provider_1_Temperature_Sensor, so LSP only has to be demonstrated for this pair. Since there is never a substitution between Temperature_Sensor and Provider_1_Temperature_Sensor, it's OK if this latter pair does not fulfill LSP.

The OO supplement to DO-178C allows verification of local substitutability to be performed either by testing or formal proof. One way to test things would be to actually test the substitution -- that is, to write requirements-based tests to exercise the pre- and post-conditions of the parents and verify that the tests are passed when child objects are used. The alternative is to use formal proof, demonstrating that the parent pre-conditions imply the child's preconditions, and that the child's post-conditions imply the parent's post-conditions.

We should also point out that violating LSP in local substitutability does not doom the software to automatically violate the DO-178C / OO supplement objectives. However, it will increase the effort in demonstrating that the verification objectives will be met, and in particular will necessitate the pessimistic testing strategy described earlier.

Ada 2012 Class-Wide Contracts

In order to further illustrate these points, we're going to take the example of Ada 2012 which recently added some specific support for OOP verification.

Contracts require a notation in which they can be expressed. Natural language is often used, but suffers from ambiguity (for example, what does "A and B or C" mean?). Most programming languages lack the required expressiveness (for example for quantificational logic) and thus if they support contracts at all it is through a specialized syntax using some sort of annotations. However, being able to write contracts directly in the code using standard programming language features, versus a specialized annotation language, has great advantages. If contracts are written using the same notation as the code, then they are less likely to get out of synch as the code evolves. In particular, the compiler will be able to verify that the entities they refer to are consistent with those of the code. Contracts can also be used to develop dynamic tests, they can be exploited by static analysis tools to perform additional checks or to help demonstrate the absence of run-time exceptions, and even, under certain circumstances, be formally proven to be correct.

The DO-178C standard was developed roughly at the same time as the Ada 2012 standard, one of whose main application domains is the avionics industry, and the two efforts have benefited from some level of cross-fertilization. In particular, while the Liskov Substitution Principle was debated in the DO-178C committee, pre- and post-conditions were discussed in the Ada 2012 standardization group. The result is native support in Ada 2012 for subprogram pre- and post-conditions as well as other contract features such as type invariants. Here's an example using the Ada 2012 constructs Pre'Class and Post'Class (literally meaning pre- and post-conditions that apply to the class hierarchy): type Sensor is tagged ... -- Ada syntax for root of class hierarchy

function Calibrate (This : Sensor) return Boolean
   with Pre'Class  => ... --pre-conditions specific to Sensor
        Post'Class => ... --post-conditions specific to Sensor

type Temperature_Sensor is 
  new Sensor with ... --additional attributes

overriding
function Calibrate (This : Temperature_Sensor) return Boolean
   with Pre'Class =>
            ... --pre-conditions specific to Temperature_Sensor
        Post'Class =>
            ... --post-conditions specific to Temperature_Sensor

In Ada 2012, the overriding subprogram inherits the parent's pre- and post-conditions based on the Liskov Substitution Principle. In particular, the overridden subprogram must at least accept the parent pre-condition, or potentially a specific one if the parent one is not met. So the actual pre-condition of the subprogram overridden in Temperature_Sensor is ([pre-conditions specific to Sensor] or [pre-conditions specific to Temperature_Sensor]). On the other hand, this same overridden subprogram must guarantee at least as many things as the parent one, so its actual post-condition is ([post-conditions specific to Sensor] and [post-conditions specific to Temperature_Sensor]). The dissymmetry in the contract inheritance might be surprising at first sight but makes sense in the context of LSP. Ada allows the program to enable these checks to be made at run-time. Additional research efforts are under way to allow these to be formally proven, and will probably be the subject of a follow-up article.

Conclusion

With proper care, object orientation can be used in highly reliable embedded development just as much as other development methods. Recent standardization efforts have provided a framework combining good programming practice, reasonable expectations from the safety assessor point of view, and guidelines to tools providers. It's nice to see that over forty years after the birth of SmallTalk, OOP is finally available to the entire community of software developers!