ID 114341462 © Siriporn Kaenseeya | Dreamstime.com
67ed905408542d7dd635b3a9 Programmer Code Promo

Cracking the Code of Undefined Behavior

April 2, 2025
Foremost in any strategy for the creation of safe and reliable C/C++ code is prevention of undefined behavior. But be prepared, because it’s not easy.

What you’ll learn:

 

On June 4, 1996, the Ariane 5 rocket was launched, carrying a constellation of four research satellites. Approximately 30 seconds after liftoff, the rocket exploded.

The subsequent investigation found the onboard computer system had attempted to convert a 64-bit floating-point number to a 16-bit signed integer. The software was designed for the predecessor rocket, Ariane 4. With a much higher horizontal velocity, the conversion for Ariane 5 caused an integer overflow, triggering a chain of events that led to the explosion.

Signed integer overflow is a common type of undefined behavior (UB). It’s one of many operations and constructs whose behavior isn’t defined in the standards for programming languages like C/C++.

If UB happens, the results are unpredictable. It may not cause a catastrophe, but it often has a significant impact, particularly in creating security vulnerabilities. In this article, we look at the most common types of UB, the problems they create, and how to eradicate them.

Common Types of Undefined C/C++ Software Behavior

There are many causes of UB. The figure shows a ranking of the common types, based on their frequency of occurrence.

Signed integer overflow: UB occurs if a calculation results in a value that exceeds the maximum or minimum value representable by the integer type.

Buffer overflow: UB occurs if more data is written to a buffer than it can hold, causing data to overwrite adjacent memory. It may lead to memory corruption, crashes, and security vulnerabilities.

Uninitialized variables: UB occurs if a local variable, pointer, array, or structure is used before initialization. The content of the variable is whatever random data happens to be in that memory location at the time of the declaration—this might be different each time the code runs.

Index out of bounds: UB occurs if an index is used to access an array outside the valid range for that array. With C/C++, array index ranges run from 0 and there are no built-in bounds checking, making invalid index operations a common trap.

(Logic) Memory access: UB occurs if a program makes an invalid attempt to access memory. C/C++ provides direct memory access, but it doesn’t automatically check that the access is valid. Examples include reading or writing data using a null or uninitialized pointer, or a pointer after the memory is freed.

Casting float to integer: UB can occur because floating-point types often represent much larger values than the limited range of integers. Depending on the system, the result may be truncated, wrapped around to a negative value, or cause a crash.

Pointer arithmetic: UB occurs if pointer arithmetic operations cause a pointer to access locations that are outside allocated memory blocks. Although pointer arithmetic is efficient, it must keep all operations within the same object and only access memory within the bounds of the array or memory block.

Invalid shift: UB occurs if the shift amount is negative, is greater than or equal to the number of bits in the integer, or would overflow the range of a signed integer.

Is NaN (Not a Number) or infinite: UB occurs if these values are handled incorrectly. Issues arise when using the values in conversions, such as casting to integers, comparisons, or memory operations, like array indexing. C/C++ don't have well-defined handling for these cases, often resulting in incorrect outcomes and crashes.

Index in address: UB occurs if an invalid index is used to access an array or pointer and causes out-of-bounds memory access. Indexed addressing is an efficient way to manipulate data structures, but the pointer or index must only access memory that’s inside the bounds of the structure.

Dangling content: UB occurs if a pointer is used to access content in memory that’s been deallocated. The pointer may still contain the address for the memory block, but, following memory deallocation, it’s not valid for the pointer to access the content in that memory block.

Undefined Behavior is a Major Problem for Developers

Compilers are allowed to assume that UB never happens. This enables them to generate extremely well-optimized code—ideal for programs that need to be fast and efficient, such as those in embedded systems. They can optimize entire programs, not just code statements that might cause UB. They’re able to reorder or remove checks deemed unnecessary and make aggressive optimizations that wouldn’t be valid if the UB were to occur.

However, just because compilers assume that UB never happens doesn’t mean it’s never the case.

Finding these bugs is hard. According to a report presented at the 2023 USENIX Security Symposium, 39% of bugs require between two and 24 hours to resolve, and 8.7% of bugs require more than 24 hours of work to reach a resolution.

To make matters even more challenging, rearranged code can change the execution flow, leading to errors even before occurrence of the UB, making it difficult to trace the root cause. Debugging messages can appear in misleading places, give unexpected values, or appear to show everything was working fine even as the program crashed.

An added complication is that different compilers and platforms operate in different ways. A program with UB may function differently depending on the system it’s running on. It can even produce different results at different times on the same system.

Preventing Undefined Behavior in C/C++ is Essential, But It’s Not Easy

UB that causes software to function incorrectly, crash, or create security vulnerabilities can impose a heavy cost for companies, both financially and damage to their reputation. Bugs in embedded systems may require costly and complex product recalls or truck rolls.

Manual code reviews and traditional testing methods often fail to identify all of the operations and constructs that could lead to UB. Conventional static-analysis tools suffer from limited context and path sensitivity, resulting in a high rate of false positives and missed bugs.

Safer programming languages can help. For example, Rust is growing in popularity as it offers higher levels of security and can prevent common C/C++ issues like buffer overflows and data races.

However, Rust is an immature language for embedded development, with fewer established libraries, tools, and resources. Even if Rust is adopted, there’s still a vast C/C++ code base that must be managed. And sometimes a lower-level language is the only option for resource-constrained environments or when real-time performance and low-level control are critical.

The ONCD report from the White House, “Back to the Building Blocks: A Path Toward Secure and Measurable Software,” published in February 2024, outlines several approaches to identify and mitigate the risks associated with undefined behavior. One such approach involves leveraging hardware-based security mechanisms, such as CHERI (Capability Hardware Enhanced RISC Instructions).

CHERI builds upon existing technologies like Memory Tagging on SPARC (SPARC ADI) and ARM (ARM MTE), as well as stack protection mechanisms like Intel CET. These hardware features help detect memory violations, such as buffer overflows and invalid memory accesses, thereby preventing a subset of undefined behaviors. However, as of today, CHERI is only available as part of the Morello research program, with a prototype system-on-chip (SoC), and has yet to be integrated into commercially available silicon.

Another key strategy recommended in the report is the use of formal methods. They are mathematical techniques for verifying software correctness. Among these, sound static analysis, model checking, and theorem proving are commonly used.

Sound static analysis particularly stands out for its scalability. Unlike heuristic-based or syntactic analysis tools, sound static analyzers build a formal mathematical model of the code, using the semantics of the code, not just the syntax. It applies exhaustive verification techniques to examine all possible execution paths of the code and simulate all possible inputs and states. In this way, it verifies that the program will behave as expected and detects all bugs—including those that could lead to UB—without false positives or negatives.

This approach ensures that all instances of undefined behavior, such as buffer overflows, use-after-free errors, and uninitialized memory access, are detected with mathematical certainty, eliminating false positives and false negatives.

When considering such a technology, it’s crucial to evaluate its integration into modern development practices, particularly continuous integration workflows, to ensure consistent code quality when making changes. In addition, emulating the target hardware architecture is essential for accurate verification.

Today, commercially available sound, exhaustive static-analyzer tools enable developers to proactively eliminate undefined behavior. One such example is TrustInSoft Analyzer, which leverages sound and exhaustive static analysis to provide exhaustive verification of C and C++ code.

As cyberthreats continue to rise and regulations such as the EU’s Cyber Resilience Act (CRA) introduce stricter security requirements, formal verification plays an increasingly vital role. It not only enhances software security and reliability, but also helps organizations demonstrate compliance with industry standards.

By adopting sound static analysis today in their memory safety roadmap (CISA), developers can systematically eradicate undefined behavior from system-level programming languages like C, C++, and unsafe Rust, paving the way for safer and more resilient software.

References

Final-ONCD-Technical-Report.pdf

Department of Computer Science and Technology: Capability Hardware Enhanced RISC Instructions (CHERI)

MTE User Guide for Android OS

A Technical Look at Intel’s Control-flow Enforcement Technology

Product Security Bad Practices (CISA pdf)

The Case for Memory Safe Roadmaps | CISA

The Case for Memory Safe Roadmaps | CISA (with the date of December 2023)

About the Author

Philippe Cau | Senior Product Manager, TrustInSoft

Philippe Cau is the Senior Product Manager for TrustInSoft, where he manages product development.

Sponsored Recommendations

Comments

To join the conversation, and become an exclusive member of Electronic Design, create an account today!