Electrical engineering used to be exclusively about relays and switches, state machines, and signals. With these came defined functionality. And, by the time I started my career in hardware-software integration for aerospace, we tried to limit what was done in software, preferring to use techniques that were stable, but no longer at the forefront of commercially available systems.
This is generally no longer possible whether in aerospace or other industries. Companies today prefer to use more state-of-the-art commercially available components, which tend to be small, smart, and driven by software. These components make our systems more flexible and powerful, but that flexibility comes at a price—the system can do many more things than thought imaginable.
This file type includes high resolution graphics and schematics when applicable.
To make sure your systems work correctly, testing is the key. Let me walk you through several fundamental things to know about software testing.
Software state machines can exponentially increase the number of states.
Electrical system design often starts with a state machine and having an understanding of the different operating modes for a particular product. You can typically map that state-machine functionality to logic very quickly and easily. However, in cases where the state machine gets more complicated, that state machine is often translated into hardware.
In principal, this gives us a lot more flexibility—we can change the machine at will to meet our business needs. However, the translation from a hardware state machine to a software state machine is problematic, because the number of states in software increases exponentially as additional logic is introduced.
The key principal here is to understand all of the logic and make sure it corresponds to what we intended. Tracking requirements are key; I’ll discuss that in more detail in one of the next items. Before we get there though, let’s think about how to best understand a state machine in hardware. Beyond diagrams and system-level design, coverage is a good way to understand if we correctly modeled the state machine and to make sure that it follows our business logic.
Coverage includes statement-level coverage (making sure all statements are analyzed), branch / decision-level coverage (making sure all if-then-else conditions go through the true and false states), and MCDC coverage (making sure the subconditions independently affect the result). In understanding a system, as opposed to a state machine, these metrics let us know whether we go into the decision paths we expect. A logical state machine has a discrete number of states and we can test it by going through those states. A software system, however, has an infinite number of states and the best way to test it is to test the logic, especially the conditional and subcondition logic.
Beyond the coverage itself we need to look at what drives the coverage.
Good software comes from good requirements.
High-level requirements are essential to making sure the system functions correctly. Such requirements characterize the business logic and intended functionality, and enable us to evaluate whether our system does what it’s supposed to do. Best practices follow the flow of requirements from high-level requirements through analysis into coverage.
Using the state-machine model, requirements that characterize each state are examples of high-level requirements. Following the exercise of those requirements into code is a very good way to make sure that each requirement exercises the expected section.
Leading aviation, automotive, and industrial process standards, such as DO-178C, ISO 26262, and IEC 61508, extend this to a concept of requirement traceability. They often mandate that users exercise all of their code from high-level requirements and explain and test any uncovered cases with low-level testing. This practice of connecting requirements to code, and ensuring correctness, has been instrumental in achieving safety in these industries.
Test modules before you test the system.
Low-level testing and low-level requirements ensure functionality on the module level. As systems become more complicated, it’s important to make sure modules work correctly on their own before integrating them into the system. To do this, we need to examine lower-level requirements more closely to make sure each function and set of functions does what they’re supposed to do and that they properly connect to the interfaces of the system.
Unit testing is the typical way to accomplish this task. Unit testing often involves parameterizing inputs and outputs on the function and module level, performing a review to make sure that the connection between inputs and outputs is correct, and then following the logic with coverage. Using a tool that connects individual inputs and outputs to paths of execution is quite beneficial, as it enables us to follow the path visually and use that as part of the review process.
Another key is to have an understanding of interfaces at both the functional and module level. Going back to the concept of tooling, projects should have tools that show these interfaces and connect the logic at different levels. This can often be done via static analysis.
Find your problems as early as possible.
As systems become more complicated, the more you can do to push problem definitions into the design phase saves lots of money at the integration phase.
Static analysis, which performs code analysis that models the execution of your system without actually running it, is the least expensive method in this regard. Ideally, static analysis should help the system maximize clarity, maintainability, and testability early in the design process. Static analysis can do this with code-complexity analysis, program-flow analysis, predictive runtime-error detection, and coding standards adherence checking. When selecting tools to automate the testing process, check whether the static analysis you’re reviewing includes:
• Code-complexity analysis: Understanding where your code is unnecessarily complicated so that you can perform appropriate mitigation activities.
• Program-flow analysis: Drawing design-review flow graphs of program execution to make sure that the program executes in the expected flow.
• Predictive runtime-error detection: Modeling code execution through as many executable paths as possible and looking for potential errors such as array bounds overflows and divide-by-zeros.
• Coding standards adherence: Making sure the code adheres to best programming practices. This is often industry-specific and includes standards such as MISRA (this standard began in the motor vehicle industry, but now broadly applies to many verticals), JSF-AV (aircraft avionics software), and CERT (network facing software). For a particular product, the ideal standard is often a combination of its targeted industries.
While these activities force the programmer to do more work in the front end of the development process to make sure the program runs correctly, it reduces work overall. Since the cost of failure increases exponentially as the program goes from design, test, and into the field, the more you can do in the design stage will further lower costs of the overall design.
Don't forget risk!
All of these practices come at a cost and may not always be applicable. In the software safety practices, such as DO-178C, risk comes from multiplying the cost of failure by the probability of failure. Proper risk-assessment needs to be applied on every system, subsystem, and component to make sure the appropriate mitigation activities are performed.
If you do the same activities on every component, you will over-invest in parts of your system where the risk of failure is low and not have enough time to mitigate failure in the parts of your system that have a high risk of failure. Software safety practice starts with understanding what will happen if the component or system fails and then tracks that potential failure into the appropriate activities to make sure it doesn’t happen.
A typical example is to consider a system that controls the guidance of an airplane. If the system fails, the failure can be catastrophic. Therefore, mitigation activities must be performed from requirements through sub-condition coverage to ensure correct code generation.
At the same time, consider an inflight entertainment system. If this system fails, the aircraft may not crash. But, before dismissing such a system entirely, we need to find out whether the same system may be used to communicate between the cockpit and the passengers. If so, testing must adequately test to the most significant application of the system for the consequence of failure, which can still be major, but not as extreme as systems that can cause immediate loss of life.
Conclusion
Testing starts in understanding that the state of a software system is less discrete than a hardware centric view of state. From this, you need to carefully examine requirements, module testing, static analysis, and risk mitigation. By zeroing in on these key areas, we can make sure our systems work the way we expect and ultimately help us be confident in the product’s success.
This file type includes high resolution graphics and schematics when applicable.
References:
Jay Thomas, (Ttechnical Development Manager, LDRA). “Code Quality Blog: is 100 Percent Code Coverage Analysis Essential,” Military Embedded Systems, 9 paragraphs. November 19, 2014.
Deepu Chandran (Senior Technical Consultant, LDRA ). “Building Security in: Tools and Techniques for Reducing Software Vulnerabilities,” EE Catalog, 33 paragraphs. August 21, 2014.
Chris Tapp (Field Application Engineer, LDRA). “MISRA C:2012: Ideal for Life-and-Death Applications,” EE Catalog, September 12, 2013.