Pointers in Rust: Whacking the Mole (Part 4)—Handling Weak References
What you’ll learn:
- What are weak references?
- How Rust deals with raw pointers.
Complex data structures may have cycles, with separately allocated nodes that reference each other either directly or indirectly. Cycles present a challenge for reference counting, since the interdependence between nodes prevents the nodes’ reference counts from reaching zero.
To support the definition of cyclic data structures, Rust differentiates between two categories of referencing-counting pointers:
- A “strong” reference determines storage reclamation and is the default for Rc or Arc pointers. A heap value referenced by an Rc or Arc pointer is reclaimed when its strong reference count is decremented to zero.
- A “weak” reference doesn’t affect storage reclamation. Its referent can be reclaimed even when its count of weak references is non-zero, provided that the count of strong references is 0.
A strong reference can be downgraded to a weak reference, and in the other direction, a weak reference can be upgraded to strong.
Weak references are useful when the nodes in a data structure exhibit a natural “parent-child” relationship. The parent has a strong reference to a child, and the child has a weak reference to a parent. One example is a doubly linked list where each node contains a data field (say an i32) and two pointers (actually Option values):
- A next pointer that is None for the tail node and otherwise a Some variant that references the successor node, and
- A prev pointer that is None for the head node and otherwise a Some variant that references the predecessor node.
Although cycles exist through the combination of prev and next links, we can ensure proper storage reclamation by defining the next links as strong references and the prev links as weak. The implementation of the methods provided for the data structure ensure that the next links themselves don’t create cycles of strong references.
Here’s a Rust definition for the relevant data structures and a sample implementation of several methods:
An important (implicit) property of the doubly linked list is the absence of cycles of strong references. If application code creates a doubly linked list where some node contains a next reference to itself or to a previous node, then the strong reference count will never get to zero. To prevent storage leakage, two approaches are available:
- Explicitly break the cycle by setting to None the next link of one of the nodes in the cycle; this will allow automatic reclamation.
- Define the data structure with raw pointers (see below) and use manual deallocation.
Dealing with Raw Pointers
For low-level programming, and as the implementation basis for safe pointers and custom memory management, Rust provides a facility known as raw pointers. These provide C-like functionality, but also C-like lack of checking. Raw pointers come in two varieties:
- *const T (immutable), which allows for reading from, but not assigning to, the dereferenced T value.
- *mut T (mutable), which allows both reading from and writing to the dereferenced T value.
Raw pointers can be created either through normal references (using the “&” operator) or dynamic allocation, and they can be converted (cast) to and from safe pointers. However, a raw pointer can only be dereferenced in an unsafe block. If the alloc() function in the std::alloc module is used in a function or block to allocate storage for a raw pointer, then the program needs to deallocate the storage by calling the dealloc() function explicitly.
Here's an example:
Raw pointers are efficient and aren’t necessarily unsafe. As shown in the example (commented lines (1) and (2)), Rust provides some compile-time checks that prevent unsafe practices. However, in the absence of the guarantees that come with safe pointers, the developer will need to work harder to verify the code; raw pointers enable all of the traditional errors (dangling references, storage leakage, double freeing, null dereferencing, etc.).
An important application of raw pointers is as a toolkit for defining memory pools and custom allocation and deallocation mechanisms. These are especially useful in embedded applications. Examples are the typed-arena, slab, and mempool crates in the Rust community’s crates.io registry.
Raw pointers should not be (mis)used as a workaround for avoiding language-enforced checks. They serve a useful purpose as a means to interface with foreign (non-Rust) code and implement low-level functionality.
Revisiting the Pointer Mole
Has Rust vanquished the pointer mole? Can Rust programmers write safe and efficient code while comfortably defining the underlying data structures? The answer: a slightly qualified yes.
On the positive side, Rust detects many pointer errors at compile time, it reclaims storage automatically without the overhead of a general garbage collector, and its type definition facility offers general-purpose functionality. However, and not surprisingly in light of the tensions and tradeoffs intrinsic to any pointer facility, Rust’s approach does have some drawbacks.
Arguably the most significant issue is the learning curve that most Rust programmers will need to navigate to become conversant with the language’s pointer semantics. The Borrow Checker is an algorithm based on source-code path analysis. Although its essence can be distilled into a memorable mantra—values can be borrowed mutably and exclusively or else immutably and shared—specific applications of the rules may cause surprises. Thus, as can be seen in the doubly linked list example above, traditional data-structuring idioms sometimes require a non-traditional style.
What follows are two examples that are methodologically equivalent. Each mutates a variable through a borrowed reference. But one is legal, and the other not. First the illegal code:
At statement (1), p borrows n mutably; the lifetime of p (and the associated extent of p’s mutual borrow of n goes through line (3). However, the use of n in the println!() macro at line (2) is an implicit immutable borrow of n. Since this occurs during the extent of the mutable borrow, the code is illegal. On the other hand, interchanging the last two lines makes the code legal:
The difference is that now the lifetime of p ends at line (2), so the immutable borrow of n at line (3) doesn’t occur during the extent of the mutable borrow.
The distinction between the two examples might not be immediately apparent. Aliasing in which a variable is modified indirectly and referenced directly may or may not be legal. Further, when interior mutability is required and the borrow checking rules are enforced at run-time, complex analysis may be needed to guarantee the absence of borrow-related panic.
Another potential issue is the cost of reclamation when complex data structures are dropped. Despite the dangers of explicit deallocation in languages like C, C++, and Ada, the deallocation is apparent in the source code and, especially if memory pools with fixed-size blocks are used, its run-time cost can be predicted.
In Rust, the deallocation is implicit, and careful analysis is needed to compute the run-time cost. Rust’s custom storage pool mechanism can mitigate this problem, but it brings manual deallocation and its associated risks.
Notwithstanding these issues, Rust provides an expressive pointer facility that has advanced the state of the art and practice in helping programmers write safe and efficient code. Rust requires more ramp-up time and up-front effort than other languages. However, by detecting most pointer errors at compile time, it facilitates memory safety, avoids Prof. Hoare’s billion-dollar mistake, and reduces verification costs. The frustrating pointer mole hasn’t been completely whacked, but Rust has largely stunned it into submission.