Pointers in Rust: Whacking the Mole (Part 2)—The Rust Approach

What you’ll learn:

Why the Rust language needed a fresh approach to managing pointers.
Why there are no null values in Rust.

Pointers in the Rust language bring a fresh approach to mole whacking. Targeted to high-performance/high-assurance embedded systems, Rust supplies a pointer facility with the goal of jointly supporting safety and efficiency without sacrificing expressive power.

The key is a distinction between high-level “safe pointers” and low-level “raw pointers.” Both can be implemented efficiently; the former come with assurance guarantees (including across threads), while the latter lack those guarantees. Raw pointers are potentially unsafe and thus require extra verification effort to ensure safety.

Safe pointers are based on several principles:

The absence of null pointers.
An ownership concept that prohibits manual deallocation and enables automatic storage reclamation without a garbage collector.
A borrowing concept that allows for multiple references to share the same value but ensures exclusivity for writing to (“mutating”) a referenced value.

The rules for safe pointers make it possible to detect most pointer errors at compile time—no dangling references, reading uninitialized pointers, double-freeing, or corrupting shared-data—while facilitating code optimization and helping to prevent heap storage leakage without the overhead of a garbage collector.

Safe pointers come in two varieties:

Pointers that can only point to values allocated on the heap. Rust’s standard prelude supplies several types of heap-only pointers, including String, Box<T>, and Vec<T>. These are sometimes referred to as smart pointers. Representationally, a smart pointer isn’t just a pointer (address); it can contain supplementary data. For example, a variable of type Vec<T> is implemented as a struct comprising not simply the heap address for the start of the vector, but also length and capacity fields.
Pointers known as references that can point to values on the heap, on the stack, or in static memory (including ROM). The type for such a pointer has the form &T or &mut T where T is a type; the former enables the reference to read from but not write to (mutate) the referenced value, whereas &mut T allows for both reading and mutating.

In either case, a pointer p can be dereferenced via the syntax *p.

Let’s see how all of this works.

No Null Values in Rust

Rust avoids Prof. Hoare’s “billion-dollar mistake” and uses flow-analysis-based compile-time checks to ensure that safe pointers are initialized before being used. If the programmer needs an explicit way to simulate a null value, the predefined generic enum Option<T> does the job, where T is a safe pointer type:

If the tag of an Option<T> value is Some, then a pointer of type T is present.
If the tag is None, then there’s no associated pointer value.

An Option<T> value can’t be used directly as a value of type T. Instead, it must be queried, typically in a match statement, with code that only accesses the value when the tag is Some. Misuses are caught at compile time. Below is an example:

let v : [Option<Box<i32>>; 2] =  
           [ Option::None,  
             Option::Some(Box::new(100)) ]; 

// This code is correct: 

for item in &v { 
   match item{ 
      None    => println!("Nothing here"), 
      Some(p) => println!("Boxed value: {}", *p), 
   } 
} 

// The let statement below is illegal: 
// No implicit cast from Option<T> to T 

let ptr : Box<i32> = v[1]; // Illegal: type mismatch

Here, v is an array of two Option<Box<i32>> values: None, and a Some variant that contains a Box<i32> pointer to a heap value set to 100. Processing an Option involves interrogating the “tag,” as is done in the match statement. It’s a compile-time error to attempt to use an Option as a value of the type of its Some variant.

Ownership, Dynamic Allocation, and Dropping in Rust

As noted above, Rust supplies standard pointer types like Box<T> that can only be used for dynamic allocation. A pointer from one of these types, unless uninitialized, owns the value that it points to, and, aside from reference-counting types that will be described below, the owning pointer is unique: No other pointers can share ownership. Assignment, including in implicit contexts such as parameter passing and field initialization, transfers ownership of the value from the source to the target.

In Rust parlance, the pointer is moved from the source to the target. Except for the special “no-op” case of self-assignment where the source and target pointers are the same, the move for single-ownership pointers involves several steps:

If the target is a valid (initialized) pointer, the Rust run-time implementation automatically reclaims (“drops”) the heap value referenced by the target (and recursively if the value itself contains single-ownership pointers to heap values).
The source pointer is copied to the target.
The source is treated as uninitialized; subsequent attempts to dereference it before it’s reinitialized will be flagged as compile-time errors.

At the end of the scope containing the declaration of an initialized single-ownership pointer variable, the Rust run-time implementation automatically drops the referenced heap value (and recursively if the value itself contains pointers owning heap values). The pointer variable is owned by its containing scope and dropped implicitly when the scope’s stack frame is popped. Note that the term “pointer variable” also refers to pointers that occur as formal parameters, struct fields, array and vector elements, etc.

Rust’s single ownership approach reclaims inaccessible storage without needing a garbage collector, prevents dangling references, and avoids data corruption of heap values (if a heap object can only have one owner, it can’t be owned by a pointer in another thread). It facilitates the implementation of types like Vec<T>, whose values require reallocation when their capacity is exceeded.

However, by itself, the single ownership model is too restrictive:

Transferring ownership each time a pointer is passed to a function leads to an awkward style if the pointer needs to be used after the function returns.
The single-ownership restriction inhibits the implementation of some common data structures and doesn’t support the use of pointers for indirection (i.e., pointers to declared variables rather than to dynamically allocated values).

Part 3 discusses the elements of Rust’s safe pointer facility that address these limitations.