This article is part of a five-part series on race conditions. The complete series is:
- Part 1: locations and locks
- Part 2: files and databases
- Part 3: web applications
- Part 4: user workflows
- Part 5: concurrency and language design
–
Over the course of this series we’ve explored the idea of a Location
type,
which characterises a resource that allows arbitrary reads and writes with no
co-ordination or control mechanism:
Location<T> = {
read() -> T;
write(T);
}
We’ve seen many instances of this type, from a simple in-memory counter variable with asynchronous or multi-threaded writers in part 1, through a similar structure stored in a database in part 2, to complex objects being accessed by multiple users in a web application in part 3 and part 4. In this final instalment of the series, we’re going to loop back to looking at in-memory data in programs, and how some languages have rethought concurrent programming to make these access patterns either safe or non-existent.
In part 1 we looked at how object-oriented programming and mutexes are both ways to control access to mutable data, letting us limit the ways that external callers can change the object’s state and thus ensure it remains consistent. A closely related concept is that of the actor model, a framework for concurrent programming. An actor is a process, which has some local state, and can communicate with other actors exclusively by sending them messages. This resembles the OOP idea of sending an object commands by invoking its methods, rather than extracting state from inside it and manipulating that state directly.
We can build a toy version of the actor model in Ruby as follows. The Actor
class is constructed with some other object that it retains a reference to. It
has a mailbox
, modelled as a Queue
, a thread-safe structure that supports
push
and shift
operations to add and remove values from a first-in first-out
queue. The actor starts a thread, which continuously loops over the messages in
the queue, sending each one to the underlying object.
class Actor
def initialize(object)
@object = object
@mailbox = Queue.new
@thread = Thread.new do
loop do
message = @mailbox.shift
@object.__send__(*message)
end
end
end
# ...
Queue#shift
blocks if the queue is empty, returning the next pushed message as
soon as one exists. __send__
is Ruby’s way of dynamically invoking a method by
name; message
here should be an array containing the name of the method to
invoke, and any arguments to call it with.
Actor
supports two further methods: send
lets an external caller push
messages into the mailbox – the only way actors are allowed to communicate.
And, we’ll add a wait
method which joins the actor’s thread, so we can make
our program’s main thread block on one of the actors we create.
# ...
def send(*message)
@mailbox.push(message)
end
def wait
@thread.join
end
end
In effect this Actor
class places a mutex around an entire object. Its send
method lets other actors send it commands, just like invoking the methods on a
normal object, but rather than executing the command immediately in the caller’s
thread, the command goes in a queue which is executed sequentially in a thread
dedicated to the actor. It means all the messages sent to an actor are processed
one after another, rather than overlapping in time as usually happens in
multi-threaded programs.
To work with this class, we’ll take our original Counter
class and remove the
attr_reader :value
declaration that let us read the @value
field out of the
object. Instead, we’ll add a get
method, whose sender
argument is a
reference to an Actor
, to which we send the message [:puts, @value]
to make
it print the value. In the actor model, we can’t just extract state from
objects, we have to send the actor a message and have it send us one back with
the information we want. The message :puts
will invoke the sender’s puts
method, which prints whatever it receives.
class Counter
def initialize
@value = 0
end
def inc
@value += 1
end
def get(sender)
sender.send(:puts, @value)
end
end
Let’s try out this implementation with a short example program. We’ll create two
actors, one wrapping a Counter
instance and one wrapping the main object –
the top-level environment in Ruby’s runtime. We’ll send the message :inc
to
the counter a thousand times, and then send it the message [:get, main]
to
make it send its final value to the main object which will print it.
counter = Actor.new(Counter.new)
main = Actor.new(self)
1000.times { counter.send(:inc) }
counter.send(:get, main)
main.wait
This program prints 1000
, as we’d expect as all our send(:inc)
calls are
sequential. Let’s now try the same thing using a set of threads:
counter = Actor.new(Counter.new)
main = Actor.new(self)
threads = (1..4).map do
Thread.new do
1000.times { counter.send(:inc) }
end
end
threads.each(&:join)
counter.send(:get, main)
main.wait
This works correctly, printing 4000
. We didn’t have to use a mutex to make
sure this concurrent access to the counter
object is valid. The Actor
class
ensures the counter does not lose updates by enqueueing all the messages it
receives and handing them off to the underlying Counter
sequentially.
The actor model both extends and restricts object-oriented programming, making us think of objects as processes. An object encapsulates some state that must be kept consistent, and updates it according to a sequence of messages it receives. Effectively, we can only write setter methods that change an object’s state, rather than getter methods that extract that state. If we want to get a value out of an actor, we can ask it to send that value to us, but this model instead encourages flowing updates in one direction through a system by propagating events through a network of actors.
If there is a cycle in that network, there’s a clear thread of causality and
messages to the same object cannot be interleaved in the way normal OOP allows.
Consider this JavaScript program, where a call to Joe.hello()
causes an
invocation of Mike.hello()
, which itself calls Joe.interrupt()
:
let Joe = {
hello() {
console.log('starting Joe.hello');
Mike.hello();
console.log('finishing Joe.hello');
},
interrupt() {
console.log('INTERRUPTING JOE');
}
};
let Mike = {
hello() {
Joe.interrupt();
}
};
By calling Mike.hello()
, Joe
might indirectly be invoking its own methods,
so it needs to make sure that whatever changes Joe.hello()
makes are safe if
Joe.interrupt()
happens in their midst.
$ node joe_mike.js
starting Joe.hello
INTERRUPTING JOE
finishing Joe.hello
In the actor model, an object’s own methods can only be invoked immediately by
that object. If it sends a message to another object that results in another
message looping back to it, that new message will go in the mailbox and won’t be
processed until the current message is done processing. Let’s write the above
using the Actor
class we developed above:
class Joe
def hello(joe, mike)
puts "starting Joe.hello"
mike.send(:hello, joe)
puts "finishing Joe.hello"
end
def interrupt
puts "INTERRUPTING JOE"
end
end
class Mike
def hello(joe)
joe.send(:interrupt)
end
end
joe = Actor.new(Joe.new)
mike = Actor.new(Mike.new)
joe.send(:hello, joe, mike)
joe.wait
Each object’s methods need to take arguments that let them refer to the actors
in the system, otherwise we need to use global variables. When we run this
program, the call mike.send(:hello, joe)
causes an invocation of
joe.send(:interrupt)
, but that message will be processed after Joe#hello
has
finished executing – the Actor
wrapper makes sure of this.
$ ruby joe_mike.rb
starting Joe.hello
finishing Joe.hello
INTERRUPTING JOE
This is useful because it limits the extent to which an object’s own methods can be interleaved, reducing the number of possible code paths that can change the object’s state and making the system easier to reason about.
One language that puts the actor model at the heart of its mechanics is Erlang. In Erlang, actors are lightweight concurrent processes and a single runtime can span multiple machines, with actors communicating over the network. Rather than using objects with mutable state, Erlang processes wrap pure functions. State is handled by passing the starting value into a function, and then calling the function with a new value in response to a message the process receives.
For example, below is an example of how we’d write a counter in Erlang. It responds to two messages:
{ inc }
, which is a tuple containing the single wordinc
; this causescounter
to call itself again with a new input:Value + 1
. In Erlang, variables begin with uppercase letters.{ get, From }
, a tuple of two words, the second of which is a variable identifying the process that sent the message; this sendsValue
to the process identified byFrom
and then callscounter
with the currentValue
.
counter(Value) ->
receive
{ inc } -> counter(Value + 1);
{ get, From } -> From ! Value, counter(Value)
end.
The recursive calls to counter
serve to establish an infinite loop in which
the process runs continuously, waiting for new messages and updating its state
– the Value
variable here. We create a running process from this function
using the spawn
function in the Erlang shell:
> Ctr = spawn(fun() -> counter(0) end).
We can then send it messages using the !
operator; here we send { inc }
and
then { get, self() }
, where self()
returns the current actor’s process ID
or PID. This causes counter
to send the current Value
to the shell’s main
process. Calling flush()
makes the shell print any messages it’s been sent.
> Ctr ! { inc }.
> Ctr ! { get, self() }.
> flush().
Shell got 1
ok
We can send more of these messages and see that the counter updates as expected.
> Ctr ! { inc }.
> Ctr ! { inc }.
> Ctr ! { get, self() }.
> flush().
Shell got 3
ok
In the following snippet, we start a new counter process, and then spawn a
thousand other processes that each increments the counter. The syntax is
probably unfamiliar; this essentially tells Erlang to spawn
a process that
runs Ctr ! { inc }
for every value from 1 to 1000.
> Ctr = spawn(fun() -> counter(0) end).
> [spawn(fun() -> Ctr ! { inc } end) || _ <- lists:seq(1, 1000)]
These processes all run concurrently and independently; they do not co-ordinate
with each other over sending messages to the Ctr
process. However, when all of
them are complete we see that no updates have been lost and the counter has the
correct value.
> Ctr ! { get, self() }.
> flush().
Shell got 1000
ok
Erlang solves the concurrency problem by removing mutation entirely. We do not update process state by reading copies of values, modifying our copy and then writing it back. Instead, processes listen for messages and restart themselves with new state in response to what they receive. A process’s state is a pure function of its inputs and the sequence of messages it’s been sent, and we cannot access process state from outside as is normal in procedural languages.
We also have the property that messages are not interleaved, so there is no
ambiguity in what the result of some sequence of messages is. Let’s recreate our
conversation between Joe and Mike in Erlang: joe()
and mike()
are two
functions that loop infinitely by handling a message and then calling
themselves.
joe() ->
receive
{ hello, Mike } ->
io:format("starting Joe.hello~n"),
Mike ! { hello, self() },
io:format("finishing Joe.hello~n");
{ interrupt } ->
io:format("INTERRUPTING JOE~n")
end,
joe().
mike() ->
receive
{ hello, Joe } ->
Joe ! { interrupt }
end,
mike().
If we spawn two processes wrapping these functions, and send Joe ! { hello,
Mike }
, then we see the complete output from the { hello, Mike }
message
before we see the output from the { interrupt }
one.
> Joe = spawn(fun() -> joe() end).
> Mike = spawn(fun() -> mike() end).
> Joe ! { hello, Mike }.
starting Joe.hello
finishing Joe.hello
INTERRUPTING JOE
This pattern of using pure functions that process a message stream to restart themselves with new values is how Erlang models servers, parsers and all sorts of other state machines. It’s also how the Haskell-inspired Elm user interface environment works: the UI is a pure function of the model state, user interaction with that UI generates messages, those messages are combined with the current model to generate a new one, and the loop continues.
This is also how people typically program with React, following the Flux architecture that emphasizes state flowing in a single direction through your code, rather than the two-way data binding that other frameworks put between the model and the UI. That cyclic triggering of events leads to the interleaving problem we saw above, and reproduces many of the bugs we’ve explored in this series. Programming with bare events is equivalent to performing uncontrolled reads and writes of all the components and model state in a UI, and is incredibly hard to do correctly without causing race conditions. I like how Manish Goregaokar puts it:
Aliasing with mutability in a sufficiently complex, single-threaded program is effectively the same thing as accessing data shared across multiple threads without a lock
Aliasing here refers to having many variables or data structures holding references to the same value, and all of those references being allowed to perform mutations to the referenced object. Which brings us nicely to the other major example I want to look at in relation to language design: Rust’s ownership system.
In Rust, every value has a single owner: a variable, function parameter,
location in a data structure, or other binding that contains the value. If we
assign the value to another binding, the value is moved out of its previous
owner and the old binding becomes unusable. For example, this program creates a
Vec
(short for vector, a dynamically-sized array) and binds it to the
variable list
– the list
variable now owns the Vec
value. We then
perform the assignment let other = list
, which transfers ownership of the
Vec
from list
to other
, making list
unusable. The use of list
in the
final println!
statement is thus illegal.
fn main() {
let list = vec![1, 2, 3];
let other = list;
println!("list: {:?}", list);
}
This program will not compile; Rust spots the use of the invalidated binding
list
and gives us an error:
error[E0382]: borrow of moved value: `list`
--> ownership.rs:6:28
|
4 | let list = vec![1, 2, 3];
| ---- move occurs because `list` has type `std::vec::Vec<i32>`,
| which does not implement the `Copy` trait
5 | let other = list;
| ---- value moved here
6 | println!("list: {:?}", list);
| ^^^^ value borrowed here after move
You don’t need to know what borrow means just yet. What this error means is
that the binding list
is used after the Vec
value has been moved out of
it, into other
. The error shows us where the move happens, where list
is
used after its value is moved, and also why it’s moved: the type Vec<i32>
(a
vector of 32-bit integers) does not implement Copy
, a trait that causes values
to be copied in situations where the default behaviour is to move them. Only
simple values like numbers and booleans are Copy
, most complex structures like
Vec
are not.
Values are also moved when they’re passed to other functions. Here we pass
list
to the function do_nothing
, which takes Vec<i32>
is input.
do_nothing
does not use its input in any way – just passing list
to it is
sufficient to move the Vec
from main
into do_nothing
, so it can no longer
be used in main
.
fn main() {
let list = vec![1, 2, 3];
do_nothing(list);
println!("list: {:?}", list);
}
fn do_nothing(list: Vec<i32>) {}
We get a similar error from the compiler for this program, showing that the move
happens where list
is passed as an argument to do_nothing
.
error[E0382]: borrow of moved value: `list`
--> ownership.rs:6:28
|
4 | let list = vec![1, 2, 3];
| ---- move occurs because `list` has type `std::vec::Vec<i32>`,
| which does not implement the `Copy` trait
5 | do_nothing(list);
| ---- value moved here
6 | println!("list: {:?}", list);
| ^^^^ value borrowed here after move
This suggests that all values are single-use: once you pass them to a function
in order to do anything with them, they become unusable. This is true when we
pass values to functions, but not when we pass references. A function taking
Vec<i32>
as input means it takes that type by value: the Vec
is moved into
the called function. By changing its input type to &Vec<i32>
, that means the
function takes a reference (&
) to a Vec<i32>
, which it can use to read the
referenced value without taking ownership of it. For example, to print a value
we only need to be able to read from it:
fn show(list: &Vec<i32>) {
println!("list: {:?}", list);
}
A &
reference is an immutable reference; it only allows the holder to read
the referenced value, not to change it. A &mut
is a mutable reference, and
taking this kind of reference allows a function to mutate the referenced value,
again without taking ownership of it. For example, this generic function takes
&mut Vec<T>
, a mutable reference to a Vec
of any type T
, and an item of
type T
, and adds the item to the list:
fn add<T>(list: &mut Vec<T>, item: T) {
list.push(item);
}
The method call list.push(item)
is syntactic sugar for Vec::push(list,
item)
, and Vec<T>::push
is just another function that’s defined to take a
&mut Vec<T>
. If it took a read-only reference &Vec<T>
, the compiler would
not let it mutate the Vec
struct on which it operates – to change a structure
you must have a mutable reference to it, or you must own it, a read-only
reference is not enough. The show()
function above would not be allowed to
call list.push()
, because it takes a read-only &Vec<i32>
.
Use of &
and &mut
references is known as borrowing, creating a pointer to
a value instead of passing a copy of it to the called function. We saw above the
compiler telling us about a “borrow” of the list
binding when we used
println!
, and that’s because println!
is a macro for printing the given
values, and it generates code that only needs to read the arguments, not change
or take ownership of them.
Borrowing lets us use values without destroying them by moving them, but it
comes with an important restriction. As well as having a single owner, a value
may have either a single &mut
reference or any number of &
references to it
in existence at any point in a program. That is, &
and &mut
references to
the same value cannot coexist, and we cannot have two &mut
references to the
same value at the same time. These restrictions limit how values can be changed,
and are designed to prevent data races of the kinds we’ve examined in this
series.
For example, the following function is legal; it creates a &
reference to
list
in order to call show()
, and when that function call ends the &
reference is immediately dropped. Therefore we can then create a &mut
reference to call add()
, which again the &mut
reference is dropped
immediately after this call, allowing us to call show()
again.
fn main() {
let mut list = vec![1, 2, 3];
show(&list);
add(&mut list, 4);
show(&list);
}
This program compiles and prints the state of the Vec
before and after adding
the new element:
list: [1, 2, 3]
list: [1, 2, 3, 4]
What we’re not allowed to do is create a &
reference, and then create a &mut
to the same value while the &
reference still exists. This program stores a
&Vec<i32>
reference to list
in the variable list_ref
, which is later used
by being passed to show()
. Between creating list_ref
and using it, we try to
call add()
, which requires a &mut
to the Vec
.
fn main() {
let mut list = vec![1, 2, 3];
let list_ref = &list;
add(&mut list, 4);
show(list_ref);
}
This will not compile: the expression &mut list
cannot exist inside a region
of the program where &list
exists. The compiler tells us as much by showing us
where the conflicting references are created and used:
error[E0502]: cannot borrow `list` as mutable because it is also borrowed as
immutable
--> ownership.rs:8:9
|
6 | let list_ref = &list;
| ----- immutable borrow occurs here
7 | add(&mut list, 4);
| ^^^^^^^^^ mutable borrow occurs here
8 | show(list_ref);
| -------- immutable borrow later used here
The same thing happens if we try to crate a &
reference at a point where a
conflicting &mut
reference exists:
fn main() {
let mut list = vec![1, 2, 3];
let list_mut = &mut list;
show(&list);
add(list_mut, 4);
}
This program fails to compile, with an error similar to the one above:
error[E0502]: cannot borrow `list` as immutable because it is also borrowed as
mutable
--> ownership.rs:8:10
|
6 | let list_mut = &mut list;
| --------- mutable borrow occurs here
7 | show(&list);
| ^^^^^ immutable borrow occurs here
8 | add(list_mut, 4);
| -------- mutable borrow later used here
The conflicting references do not need to point to exactly the same location; a
reference covers the value it points at and any values contained within it, so
&
and &mut
references conflict if the regions of memory they cover have any
overlap. For example, here we say &mut list[1]
to get a mutable reference to
the second slot in the array, and then try to print the list before using
list_mut
to replace the second element of the array.
fn main() {
let mut list = vec![1, 2, 3];
let list_mut = &mut list[1];
show(&list);
*list_mut = 4;
}
This is not legal: we can’t take a reference to the whole list while holding a
mutable reference to something inside it. Reversing these statements so that we
print the list after we’ve finished using &mut list[1]
is fine:
fn main() {
let mut list = vec![1, 2, 3];
let list_mut = &mut list[1];
*list_mut = 4;
show(&list);
}
This compiles and prints the Vec
after its second element has been replaced.
[1, 4, 3]
Note that in higher-level languages like Ruby or JavaScript, the expression
list[1]
gives a reference to the object that is currently second in the list.
That is, if we have the following list of JS objects:
let list = [{ id: 1 }, { id: 2 }, { id: 3 }];
let second = list[1];
Then second
returns a reference to the object { id: 2 }
, and it continues
to point at this object if something is inserted at the front of the list:
// list = [{ id: 1 }, { id: 2 }, { id: 3 }]
// ---------
// ^
// |
// second
list.unshift({ id: 4 });
// list = [{ id: 4 }, { id: 1 }, { id: 2 }, { id: 3 }]
// ---------
// ^
// |
// second
So if we now said second.id = 5
, the contents of list
would be:
// list = [{ id: 4 }, { id: 1 }, { id: 5 }, { id: 3 }]
In Rust, references work more like pointers in C. The expressions &list[1]
and
&mut list[1]
are pointers to the second slot in the array, as a fixed address
in memory. So if we had second = &mut list[1]
with the initial list state:
// list = vec![{ id: 1 }, { id: 2 }, { id: 3 }]
// ^
// |
// second
If another item was inserted at the front of the list, then second
would
continue to point at the same location, not at the same object it was pointing
at before:
// list = vec![{ id: 4 }, { id: 1 }, { id: 2 }, { id: 3 }]
// ^
// |
// second
Adding an item to a Vec
can also cause the entire structure to be moved in
memory, as it may require more space to hold the new item. In that case, the
second
reference would end up pointing at memory that’s no longer in use.
These are some of the many problems that are prevented by disallowing
overlapping &mut
references to co-exist: modifying a structure may invalidate
any existing references into the structure, so you’re only allowed to create a
&mut
reference if there are no other references pointing into the same
structure. As such, &
references provide true immutability: the existence of a
&
reference to a structure prevents anything inside it, and sometimes the
structure containing the referenced value, from being changed.
So unlike languages that embrace pure functions, like Erlang, Rust still allows
mutation, it just restricts it with compile-time checks. You can let a function
read from a structure by giving it a &
reference, and be sure the referenced
value will not change while that &
reference is live. The borrow checker –
the part of the compiler that enforces the “one &mut
or many &
s” rule –
provides similar guarantees to a database letting you lock records or running
transactions with serializable isolation. It lets many functions read a value
at the same time but if a function wants to modify the value, it can only do
that while no other function is reading or writing it. In fact when I was first
hearing about Rust, it sounded a lot like people had applied database
concurrency control theory to a programming language!
We’ve already seen how Rust prevents various read/write phenomena in
single-threaded code, but it really shines when you want to write concurrent
code. A driving force in the design of this system was to prevent data races in
threaded code, by letting the compiler reject programs with concurrency bugs.
For example, let’s try to rewrite one of our early examples of a counter that is
incremented by a set of threads. We’ll start four threads, each of which tries
to increment counter
a thousand times.
use std::thread;
fn main() {
let mut counter = 0;
let mut threads = Vec::new();
for _ in 0..4 {
let th = thread::spawn(|| {
for _ in 0..1000 {
counter += 1;
}
});
threads.push(th);
}
for th in threads {
th.join().unwrap();
}
println!("counter: {}", counter);
}
As we saw in part 1, this program is not safe: allowing multiple threads to read and write a counter will result in lost updates. Rust knows this, and the compiler rejects this program. It gives us three distinct error messages, the first being:
error[E0499]: cannot borrow `counter` as mutable more than once at a time
--> counter.rs:8:32
|
8 | let th = thread::spawn(|| {
| - ^^ mutable borrow starts here in previous
| | iteration of loop
| __________________|
| |
9 | | for _ in 0..1000 {
10 | | counter += 1;
| | ------- borrows occur due to use of `counter` in closure
11 | | }
12 | | });
| |__________- argument requires that `counter` is borrowed for `'static`
This says we’re trying to create multiple &mut
references to counter
at the
same time, in the loop that starts the threads up. The &mut
reference is
needed in order to mutate counter
in the statement counter += 1
. The reason
the &mut
references end up coexisting is down to how closures work. The
structure passed to thread::spawn()
, that is the expression || { ... }
, is a
closure, which is a block of code that can be stored and executed at will, just
like anonymous functions in JavaScript, blocks in Ruby, or lambdas in Java or
Python. To do that, it needs to capture references to any variables it accesses
that are defined outside the closure; here counter
is declared outside the
closure and the closure includes the statement counter += 1
, so the closure
object, which is just a struct in memory like any other, needs to hold a &mut
reference to counter
.
What makes the &mut
references overlap in time is the lifetime of the
closure: the compiler tells us “argument requires that counter
is borrowed for
'static
”. 'static
is Rust jargon for “the duration of the whole program”, in
other words it’s saying the closure will exist indefinitely with respect to
the function that created it. Think about what thread::spawn()
does: it
doesn’t execute the closure immediately, waiting for it to complete. It starts a
new thread of execution, so the closure will begin executing at some unknown
time, concurrently with other threads, possibly after the enclosing function
main()
has returned. That means the closures we create inside the outer loop
may exist at the same time, and therefore any references they capture must be
ones that are allowed to coexist.
In effect, this code is equivalent to the following, where we try to push
several &mut
to the same value into a Vec
:
fn main() {
let mut counter = 0;
let mut mut_refs = Vec::new();
for _ in 0..4 {
mut_refs.push(&mut counter);
}
}
The thread code fails to compile for the same reason: it’s creating structures
that contain &mut counter
and those structures remain in existence
indefinitely, which is what “borrowed for 'static
” is getting at.
The second error we get is also related to the 'static
lifetime of the
closures passed to thread::spawn()
:
error[E0373]: closure may outlive the current function, but it borrows
`counter`, which is owned by the current function
--> counter.rs:8:32
|
8 | let th = thread::spawn(|| {
| ^^ may outlive borrowed value `counter`
9 | for _ in 0..1000 {
10 | counter += 1;
| ------- `counter` is borrowed here
|
note: function requires argument type to outlive `'static`
--> counter.rs:8:18
|
8 | let th = thread::spawn(|| {
| __________________^
9 | | for _ in 0..1000 {
10 | | counter += 1;
11 | | }
12 | | });
| |__________^
help: to force the closure to take ownership of `counter` (and any other
referenced variables), use the `move` keyword
|
8 | let th = thread::spawn(move || {
| ^^^^^^^
The note here is telling us that thread::spawn()
only accepts closures that
“outlive 'static
”, meaning they remain valid indefinitely rather than being
bound to the lifetime of another value or function call. The closure contains a
reference to counter
, which is a local variable in the main()
function, and
which will therefore cease to exist when main()
returns. But, the closure is
required to be able to outlive the current function, so it’s illegal for it to
hold references to values that will vanish when the function returns!
So, closures passed to thread::spawn()
are not allowed to refer to local
variables from the calling function. Instead, the closures must take ownership
of any values they use, which can be achieved by putting the word move
before
the pipes.
The final error requires less explanation; it is simply a &
vs &mut
conflict
that we’ve already seen in single-threaded code. We’re creating a set of &mut
references to counter
that will live for an indefinite amount of time
(“borrowed for 'static
”), and so it’s illegal to take a &
reference to
counter
in order to print it at the end of main()
.
error[E0502]: cannot borrow `counter` as immutable because it is also borrowed
as mutable
--> counter.rs:20:29
|
8 | let th = thread::spawn(|| {
| - -- mutable borrow occurs here
| __________________|
| |
9 | | for _ in 0..1000 {
10 | | counter += 1;
| | ------- first borrow occurs due to use of `counter` in
| | closure
11 | | }
12 | | });
| |__________- argument requires that `counter` is borrowed for `'static`
...
20 | println!("counter: {}", counter);
| ^^^^^^^ immutable borrow occurs here
Let’s see if we can fix the errors relating to the closure by taking the
compiler’s advice and adding move
in front of the closure:
let th = thread::spawn(move || {
for _ in 0..1000 {
counter += 1;
}
});
This does actually compile, but the result of running the resulting binary is surprising:
counter: 0
We get a hint about why it doesn’t do what we expect from this compiler warning:
warning: unused variable: `counter`
--> counter.rs:10:17
|
10 | counter += 1;
| ^^^^^^^
|
= note: `#[warn(unused_variables)]` on by default
= help: did you mean to capture by reference instead?
The counter
variable we use inside each thread is unused, which means it’s
never read. That’s because by adding move
to the closure, we made it take
ownership of all the variables it uses. Instead of storing a reference to
counter
, the closure instead contains a copy of it. Integers implement
Copy
, so any situation that would normally result in a value being moved
results in integers being copied instead. So, each thread starts up with a copy
of counter
, it increments its local copy, and leaves counter
itself
untouched, and so the value of counter
at the end of the program is zero.
It’s looking like we can’t write the program we want. We can’t have the threads
holding &mut
references to counter
, so how can they all update it? The first
part of the answer is that Rust provides a Mutex
struct, that can be wrapped
around any value to control access to it. Here, we’ll use Mutex::new(0)
for
our counter, and then any time we want to access it we have to call
counter.lock().unwrap()
, which gives us a mutable reference to the wrapped
value, but only allowing one thread to have access at a time. This forces the
threads to access the counter sequentially while incrementing it.
use std::sync::Mutex;
use std::thread;
fn main() {
let counter = Mutex::new(0);
let mut threads = Vec::new();
for _ in 0..4 {
let th = thread::spawn(move || {
for _ in 0..1000 {
let mut ctr = counter.lock().unwrap();
*ctr += 1;
}
});
threads.push(th);
}
for th in threads {
th.join().unwrap();
}
let ctr = counter.lock().unwrap();
println!("counter: {}", *ctr);
}
Note that we never explicitly unlock the Mutex
. The ctr
reference that we
get by calling counter.lock()
goes out of scope at the end of the loop body,
and the compiler automatically destroys it at that point. For Mutex
locks, the
destruction procedure releases the lock so another thread can acquire it. All
types in Rust can add custom behaviour when their values go out of scope by
implementing the Drop
trait.
Unfortunately this still is not enough to make the program compile:
error[E0382]: use of moved value: `counter`
--> counter.rs:9:32
|
5 | let counter = Mutex::new(0);
| ------- move occurs because `counter` has type
| `std::sync::Mutex<i32>`, which does not implement the
| `Copy` trait
...
9 | let th = thread::spawn(move || {
| ^^^^^^^ value moved into closure here, in
| previous iteration of loop
10 | for _ in 0..1000 {
11 | let mut ctr = counter.lock().unwrap();
| ------- use occurs due to use in closure
The type Mutex<i32>
is not Copy
, so values of that type are moved in any
situation where the ownership system determines movement is required. Here, the
move
closure includes a use of counter
, and so the first thread we create
takes ownership of counter
and none of the subsequent threads can use it. We
need a way for all the threads to share a reference to the Mutex
and
ultimately access the integer it wraps.
This is where the final ingredient in the solution comes in: Rust’s Arc
type.
ARC stands for atomic reference counter, and this type provides a way for many
threads to share a reference to a value. We modify the first section of our
program like so:
let counter = Arc::new(Mutex::new(0));
let mut threads = Vec::new();
for _ in 0..4 {
let ctr_ref = counter.clone();
let th = thread::spawn(move || {
for _ in 0..1000 {
let mut ctr = ctr_ref.lock().unwrap();
*ctr += 1;
}
});
threads.push(th);
}
There are two main changes here. First, counter
is now initialised as
Arc::new(Mutex::new(0))
: an atomic reference counter to a mutex-wrapped 32-bit
integer, or Arc<Mutex<i32>>
. Second, instead of each thread taking ownership
of the Mutex
, it takes ownership of ctr_ref
, a clone of the Arc
. Cloning
an Arc
just copies a reference, not the wrapped value, in a way that’s safe
to share the references across threads, so we can now give all the threads a way
to access the shared counter and mutate it safely.
The program now compiles without warnings and prints the expected result:
counter: 4000
There’s a fair bit to learn in order to make this work, and this is part of why
Rust’s learning curve is renowned for being steep. However, it does mean that if
your program compiles it’s almost guaranteed not to contain any data races. It
is possible to subvert these guarantees, for example I can break the above
program by using separate Mutex
locks for reading the counter value and then
updating it:
let th = thread::spawn(move || {
for _ in 0..1000 {
let tmp = *ctr_ref.lock().unwrap();
*ctr_ref.lock().unwrap() = tmp + 1;
}
});
But this is more effort than writing the code that doesn’t have this bug, and still requires fixing all the errors we saw along the way in order to even be allowed to compile this program. Rust puts up a lot of barriers to you writing concurrency bugs, so much so that if you manage to get around them you’ve almost certainly done so deliberately.
This distinction reminds me of the difference in how PostgreSQL and MySQL handle
failed INSERT
queries that violate uniqueness constraints, as we saw in part
3. PostgreSQL makes the transaction unusable unless you use savepoints,
whereas MySQL keeps on running your code in a way that might deadlock when you
get to production. It’s an ergonomic change, marking the difference between you
writing bugs deliberately or by accident. A system that raises errors on code
that isn’t guaranteed to work is helping you avoid pushing bugs to production,
while a system that is more lax might feel easier to use or to experiment with,
but ultimate lets you ship subtle bugs that are harder to diagnose.
JavaScript has a similar problem. At first it feels easier to use than Rust – the learning curve is less steep, and it’s much less effort to get your programs to run at all. But writing production-quality concurrent JavaScript is extremely difficult, and the language itself gives you barely any help at all. In JS, you see weird race conditions happen in production and spend countless hours trying to find out why they’re happening. In Rust, the compiler won’t let you run a program that contains race conditions. Initially that feels incredibly restrictive and it can be a frustrating learning experience, but it will always tell you why your program doesn’t work, which is a substantial part of the work of debugging anything. Sometimes a system that feels easy at first is simply hiding complexity that you will need to figure out on your own later.
A similar distinction exists between the Puppet and Chef config management systems. Puppet has its own custom language for declaring what should be installed on your machines, and it’s highly restrictive – it has a limited syntax, and you have to explicitly declare all the dependencies between things you’re installing. Chef recipes are written in Ruby; they still use a declarative API but Chef executes them more like normal scripts, running a recipe from top-to-bottom without doing any static analysis first. Puppet feels restrictive but it can spot where your config makes no sense, for example because it contains conflicting or contradictory instructions, and it can do this because it has a custom language that only lets you write code that will have predictable results when executed. Chef can feel easier to get going with as it uses an already-established language, but it will also happily run configuration that makes no sense, for example you can declare that a package both should be and should not be installed, and the resulting machine state just depends on whatever order those instructions happen to run in.
All technical system designs seem to sit somewhere on this spectrum between feeling easy to get started with, and preventing you making mistakes. I would put JavaScript and Chef on the “easy to use” end, and Rust and Puppet on the “hard to break” end. Being hard to break usually means that you as a user have to explain more of your intent to the system, for example by using a static type system or by declaring all dependencies, and this effort pays off to the extent that the system catches bugs you wouldn’t have caught yourself. One of the first things I like to do when learning a new system is try to write programs that make no sense, to see if the system realises and tells me off, or executes them anyway. This sort of information is especially useful when maintaining systems, because I need to know how much the machine is going to spot bugs I’ve introduced and how much testing I need to do myself.
One of the interesting things about Rust is how much it’s pushed the ergonomics of language design, and tried to reconcile the needs of safety and ease-of-use, which often feel like they pull in opposing directions. Its designers are always looking for ways to make the language easier to use while keeping its guarantees about memory safety, augmenting the syntax to make certain tasks simpler and improving the error messages the compiler emits.
As programmers designing systems, UIs, APIs, we need to be asking how we can give users the functionality they need, in as simple an interface as possible, that also makes it easier to write correct code than to write bugs. Ultimately this comes down to the set of operations we expose and how we package and organise them, and there is a wealth of different perspectives we can draw on from programming languages, databases and other systems. Pay attention to how your system allows data to move around: everything else flows from there.