Pointers in Rust: Whacking the Mole (Part 3)—Indirection, References, and the Borrow Checker
What you’ll learn:
- What is the Rust borrow checker and why is it important?
- How mutability and reference-counted pointers are related.
Arguably the most novel aspect of Rust, and the feature that’s the most challenging for new users, is the treatment of pointers that can be used for indirection. Known as references, these pointers have a C-like syntax but with a few new wrinkles:
- &x is an immutable borrow of x. It creates a reference to a mutable or immutable value x, through which x can be read but not mutated. Ownership of x isn’t affected. The “&” operator can be applied to any construct that has a run-time presence, including declared variables, heap values, literals, and function names.
- &mut x is a mutable borrow of x. It is analogous to &x but allows mutating x and can’t be applied to immutable values.
- &T is the type for &x values where x is of type T; i.e., for pointers that can reference values of type T but do not mutate them.
- &mut T is an analogous type for &mut x values, which also allows mutating the referenced values.
In less safe languages, such a facility would risk dangling references or data corruption, while also inhibiting some useful optimizations. Through a compiler functionality known as the Borrow Checker, Rust enforces restrictions that avoid these drawbacks.
Here’s an example that shows how Rust prevents dangling references:
Line (1) declares ptr as an immutable reference to an i32 value, and line (2) initializes ptr with a reference to n (an immutable borrow; the owner of n is its enclosing scope). This is a potential dangling reference but it’s not dangerous yet, and indeed the use of *ptr at (3) is legal. However, an attempt to use *ptr at line (5) would be an actual dangling reference, since n would have been dropped at the end of the inner scope (4). The Borrow Checker detects this error through lifetime analysis and rejects the program.
In the absence of line (5), the lifetime of ptr would only extend from line (1) through its last use (line (3)), even though it doesn’t get dropped until exit from the scope at (6). However, in the presence of line (5), the lifetime of ptr extends through (4), which is longer than the lifetime of its referent n, and thus the program is illegal. The fact that a variable’s lifetime may be shorter than its lexical scope (a property known as “non-lexical lifetimes”) provides flexibility without compromising safety.
An immutable variable can only be borrowed immutably. A mutable variable x can be borrowed either mutably via &mut x, or immutably via &x. In the latter case, x can be mutated but not through the &x pointer.
A linchpin of Rust pointer safety is exclusivity of mutable borrows. While a variable is borrowed immutably, all other borrows (whether mutable or immutable) are prohibited. On the other hand, simultaneous immutable borrows are permitted. Direct uses of variables are considered borrows.
In a multithreaded environment, this property is commonly known as “Concurrent Read, Exclusive Write,” or CREW. Rust enforces this principle, while also preventing dangling references, through a variety of features including read-write locks and mutexes. Direct use of shared data across threads is permitted—static data can be referenced in functions and closures that are passed to thread::spawn()—but is discouraged for mutable values.
There are no language-provided checks that verify exclusivity of mutable borrows of static variables. To make this clear in the source program, all uses of mutable static variables must be within special syntax (“unsafe” blocks). The programmer is responsible for ensuring the absence of interference. Immutable static variables may be safely shared, though.
Aside from mutable static variables, exclusivity for mutable borrows is enforced within a single thread. This prevents some error-prone constructs while enabling useful optimizations. Here’s an example; the cache function is from Jon Gjengset’s book Rust for Rustaceans:
The Borrow Checker allows the invocation of (1), since m and n are distinct variables, but rejects the code at (2), which attempts to borrow n both immutably and mutably. The effect is that the compiler can safely cache the value of *input on entry to the function, since input and sum can’t reference the same variable. Compare this to the C function shown earlier in this article, where the optimization would have changed the effect of the program.
As an aside, the ownership and borrowing concepts that underlie pointer safety guarantees aren’t unique to Rust. The SPARK language has somewhat similar notions, but with restrictions on pointers (known as “access values” in SPARK) that allow for formal verification of program properties.
Reference-Counting Pointers
Although single ownership simplifies storage management, the limit of one owner per value can be overly constraining. For example, in a directed acyclic graph, several nodes may point to the same node, but there’s no clear owner. The referenced node should be reclaimed when and only when its last extant owner is dropped.
This scenario may sound familiar: Reclamation can be managed by the classical reference count technique and is realized in Rust through the generic smart pointer types Rc<T> and Arc<T> (“A” is for “Atomic”). These are analogous to Box<T> in that they support dynamically allocating values of type T on the heap. However, for Rc<T> and Arc<T>, the heap storage for the T value includes a count of the number of owners.
The reference count management for Rc<T> isn’t thread safe, so Rc<T> can only be used in single-threaded code. Arc<T> has the necessary protection, at some cost in performance, and is safe in multi-threaded code.
Assignment of reference counting pointers has standard Rust “move” semantics, with ownership transferred from source to target. The only new wrinkle is that, rather than automatically reclaiming the heap storage for the value referenced by the target pointer, the Rust implementation subtracts 1 from the reference count for this value. When the count goes to 0, the storage is reclaimed.
To share ownership of a value that’s referenced by a reference-counting pointer, use the clone() function from Rc<T> or Arc<T> that, despite its name, doesn’t allocate a new copy of the heap value, but instead copies the pointer:
The assignment statement comprises three actions:
- Rc::clone(&p) simply returns the pointer p and increments the reference count for *p by 1.
- The reference count for *q is decremented by 1, and, since it’s now 0, the storage for *q is reclaimed.
- The Rc::clone() result (pointer p) is copied to q, resulting in shared ownership of *p by p and q.
The reference count (more strictly, the count of strong references) for a value referenced by an Rc<T> or Arc<T> pointer p can be interrogated by the function Rc::strong_count(&p) or Arc::strong_count(&p), respectively.
To preserve the Rust principle that a value can’t be mutated through one pointer while being read or mutated by another, the referents of both Rc<T> and Arc<T> pointers must be immutable. This is a significant restriction, but, as will be shown below, Rust provides an escape.
Here's an example of multiple ownership through reference-counting pointers. The illegal lines are commented out.
Interior Mutability and Reference-Counted Pointers
The immutability requirement for reference-counted values is well motivated but overly restrictive, and Rust provides an explicit mechanism for arranging mutability within a value that’s otherwise immutable. Safety is preserved, since a check enforces exclusivity of mutable borrows, but it’s performed at run time—a failed check results in a panic that terminates the affected thread.
The feature that allows mutability within an otherwise immutable value is known as interior mutability and is realized through the Cell<T> and RefCell<T> types. Cell<T> has get() and set() methods that are appropriate if T has copyable (versus movable) values; for example, scalar types such as i32. Otherwise RefCell<T> should be used, which comes with borrow() and borrow_mut() methods.
The interior mutability property applies more generally, such as to define a struct type where some fields are immutable and others are mutable. However, it’s especially useful for reference-counted values.
Here’s an example with a borrowing violation that’s detected at run time:
At line 1, ptr1 and ptr2 point to and share ownership of the RefCell value 100, which has been allocated on the heap and has a reference count of 2. At line 2, this value is borrowed mutably. The statement at line 3 attempts to borrow this same value; this is a run-time error (panic) causing program termination. The correction is to ensure that borrow_mut is dropped before attempting the second borrow:
This program starts the same way as the previous version, sharing ownership of the RefCell at line (1) and mutably borrowing the value at line (2). But the scope of the mutable borrow (borrow_mut) ends at (3), and thus the subsequent borrow at line (4) is permitted. The program executes with no run-time errors.