Buffer overflows and number overflows are two of the most common programming errors, especially when working with C. The latter has caused all sorts of notable problems from the millennium date bug to problems with a group of Hewlett-Packard (HP) enterprise SSDs. Without a firmware update, they stop running after the 32,768-hour mark.
Does that number sound familiar?
Anyone playing with signed 16-bit integers will recognize it as the upper limit. HP hasn’t provided details related to the problem, which is fixed by a firmware upgrade, but it’s probably related to a 16-bit value in the code.
This type of problem isn’t unique to this type of firmware and any code can have issues if values hit the boundary conditions and overflow or underflow status isn’t checked. In general, C code tends not to check for these errors. It’s also why developers prefer to use very large integers; therefore, they generally needn’t worry about such problems. That’s not necessarily the best practice, but it’s usually sufficient for most applications.
On the flip side, not checking if limits are exceeded can result in difficult-to-find bugs. In this case, it takes the SSD years to hit the mark. Other bugs might only be caused when certain, uncommon criteria are met. Unfortunately, such criteria might be a child chasing a black ball onto a blacktop road where a self-driving car is rolling down the lane.
Range Limits
Using large integers isn’t always an option, particularly in microcontrollers where 8- and 16-bit integer use is common. These values may also be associated with peripheral controllers, so it can be important that the appropriate range limits be enforced as well as recognized by the developers and code reviewers.
Only a few programming languages like Ada/SPARK properly handle ranges for integers even on microcontrollers. They often can incorporate compile-time and/or run-time checks; therefore, developers don’t have to explicitly include these checks within the program.
Some languages, such as C++, can provide runtime checks by defining classes that implement the checking. The difference between this approach and a language-based approach is standards and portability. The language-based approach provides a standard way of managing range checking while a class-based approach will be specific to the class definition and there’s no standard. The C++ standard template library (STL) provides range-related definitions for iterators and containers.
Range considerations and checking are just some of the issues that developers need to consider when writing applications, as well as when developing runtime tests and unit tests. Not all problems will be related to simple overflow or underflow issues. However, knowing what to look for and how values are related should be part of the toolkit when reviewing code and designs, especially for embedded applications.
So, if you have an HP enterprise SSD, you may want to check the version numbers and the update before you run into a state where the data on those disks is inaccessible. This problem can be an issue even if an SSD is part of a RAID system, because all of the drives in a particular system are likely to be from the same batch with the same firmware issue. They would likely fail at about the same time, and a multi-drive failure would be catastrophic even for a RAID system.