A little polymorphism, part 1: vectors, closures and coherence

Last month, I followed the Advent of Code series of puzzles that have been published each December for the last few years. I typically write my solutions in Ruby, but there were a few puzzles in this batch that I decided to solve in Rust. These problems ended up teaching me a lot about how one can use traits to make a Rust API feel like something you’d do in an untyped language, including expanding the functionality of built-in types in a safe, controlled way.

This year’s story was based around a virtual machine; a software simulation of a physical computer. This particular machine has a very small set of features. It works like a bytecode interpreter, where a program is represented as an array of integers – signed 64-bit integers, rather than just bytes. Once this program is loaded into the machine’s memory, it can modify any value within that memory region, but there are no other storage abstractions – no stack, no heap, no registers, and so on.

Here’s a little example program, which I’m feeding into a Machine struct that will execute it. I’ve broken the program into one instruction per line, and annotated each line with the address of its first value – its index in the array of values representing the program. The machine starts reading and executing the program from its first element. Though I’ve written the program in lines here, the machine just sees it as an array of numbers.

let mut machine = Machine::new(&[
    1, 9, 10, 12,  //  0
    2, 11, 12, 12, //  4
    99,            //  8
    2, 3, 4, 0,    //  9
]);

Each instruction begins with an opcode that identifies one of the machine’s built-in operations. The first instruction above has opcode 1, which identifies the add operation. This takes three arguments, which are the numbers following the opcode: 9, 10, 12. All these arguments identify addresses in the machine’s memory at which to read or write values. The value at address 9 is 2, and the value at address 10 is 3. The add operation sums these values and writes the result at the address given by the third argument: 12. So after this operation, the machine’s memory beginning at address 9 contains:

2   3   4   5

After running this instruction, the instruction pointer (IP) is updated to point at the next instruction, at address 4. This has opcode 2, which is the multiply operation. It works just like add, except it multiplies rather than adding its operands. So here it takes the values from addresses 11 and 12 (4 and 5), and stores their product at address 12, so that the machine’s memory now ends with:

2   3   4  20

The IP now points at address 8, containing the opcode 99, which tells the machine to halt. So the effect of this program is to compute (2 + 3) * 4 and store the result in address 12.

machine.run();
println!("{}", machine.memory[12]);
// -> 20

Here’s a more complicated program, which computes the value of 232:

let mut machine = Machine::new(&[
    2, 18, 19, 18, //  0
    1, 17, 20, 20, //  4
    6, 20, 16,     //  8
    6, 15, 15,     // 11
    99,            // 14
    0, 14, -1,     // 15
    1, 2, 32,      // 18
]);

machine.run();
println!("{:?}", &machine.memory[18..]);
// -> [4294967296, 2, 0]

In the following, I will use the notation #x to mean “the value at address x”, so for example #3 means the value at address 3 and #4 = #5 means we write the value from address 5 into address 4.

The first two instructions in this program are familiar from above. Opcode 6 is a jump operation: if the value at the address in its first argument is zero, then the IP is set to the value at the address in the second argument, otherwise it is incremented to the next instruction as normal. So, the third instruction sets the IP to #16 if #20 == 0. The fourth instruction sets IP to #15 if #15 == 0. Since #15 is zero, this works as an unconditional jump, taking the machine back to the start of the program.

Only locations 18 and 20 are modified by this program. The addresses 15 to 17 can be viewed as constants: data items the program needs to do its job, but aren’t part of the program’s instructions. The addresses after this are used to store variables that change while the program runs. A higher-level writing of this program, where we’ve used labels rather than instruction addresses and resolved all constants, could be:

loop:
    #18 = #18 * #19
    #20 = #20 - 1
    if #20 == 0 then goto exit
    goto loop
exit:
    halt
data:
    18: 1
    19: 2
    20: 32

We can see that the effect of this program is to multiply the number 1 by 2, a total of 32 times, so that we compute 232. The result of this calculation ends up in #18.

This is a simplified version of the virtual machine from Advent of Code; the real one has instructions for comparing addresses for equality or to check if one is less than another, and its opcodes can be modified so that the arguments are treated as literal values rather than as addresses. Here’s an implementation of the machine described above; read() and write() accept an offset and turn this into an address relative to the IP.

struct Machine {
    memory: Vec<i64>,
    ip: usize,
}

impl Machine {
    fn new(program: &[i64]) -> Self {
        let memory = Vec::from(program);
        Machine { memory, ip: 0 }
    }

    fn run(&mut self) {
        loop {
            match self.memory[self.ip] {
                1 => {
                    self.write(3, self.read(1) + self.read(2));
                    self.ip += 4;
                }
                2 => {
                    self.write(3, self.read(1) * self.read(2));
                    self.ip += 4;
                }
                6 => {
                    if self.read(1) == 0 {
                        self.ip = self.read(2) as usize;
                    } else {
                        self.ip += 3;
                    }
                }
                99 => break,
                opcode => panic!("unrecognised: {}", opcode),
            }
        }
    }

    fn read(&self, offset: usize) -> i64 {
        let address = self.memory[self.ip + offset];
        self.memory[address as usize]
    }

    fn write(&mut self, offset: usize, value: i64) {
        let address = self.memory[self.ip + offset];
        self.memory[address as usize] = value;
    }
}

Now, a program that computes one fixed thing is not especially useful; programs become more powerful when they can accept input from outside their own code, and output results back to the system in a convenient way, without the caller needing to know what address the program stores its results in. Here’s a program that extends our exponentiating example above with the ability to take the two operands as inputs, and emit the result as output.

let mut machine = Machine::new(&[
    3, 25,         //  0
    3, 26,         //  2
    2, 24, 25, 24, //  4
    1, 23, 26, 26, //  8
    6, 26, 22,     // 12
    6, 21, 18,     // 15
    4, 24,         // 18
    99,            // 20
    0, 18, -1,     // 21
    1, 0, 0,       // 24
]);

Opcode 3 is the input operation; it takes a single value from the input channel and writes it to the given address. Opcode 4 is the output operation, which reads the value at the given address and sends it to the output channel. Written in our higher-level assembly-like language, it looks like this:

    #25 = input()
    #26 = input()
loop:
    #24 = #24 * #25
    #26 = #26 - 1
    if #26 == 0 then goto exit
    goto loop
exit:
    output(#24)
    halt
data:
    24: 1
    25: 0
    26: 0

The question is, how should we implement input and output? In the early puzzles, there is a single machine that takes a fixed number of inputs known ahead of time, and produces a single output. Later on, we need to generate input by observing the program’s output, for example where the program represents a robot we’re trying to guide through a maze. Later still, there are many machines running at once with their inputs and outputs chained together in sequence, or in a network where one machine can send a message to any other. That’s a lot of different situations to handle!

We can make a start by picking something that seems reasonable for the simple cases and then seeing where we get stuck. For simple problems, it would be okay to supply the inputs as a Vec, and have the outputs also be a Vec that the machine can push values into. This would let us interact with the machine like so:

// compute 3 to the power 7

machine.input.extend(&[3, 7]);
machine.run();
println!("{:?}", machine.output);
// -> [2187]

Let’s go ahead and implement this design; Machine gains input and output fields which are both Vecs:

struct Machine {
    memory: Vec<i64>,
    ip: usize,
    input: Vec<i64>,
    output: Vec<i64>,
}

impl Machine {
    fn new(program: &[i64]) -> Self {
        Machine {
            memory: Vec::from(program),
            ip: 0,
            input: Vec::new(),
            output: Vec::new(),
        }
    }

    // ...
}

And the run() loop gains a couple of branches for matching opcodes 3 and 4:

    fn run(&mut self) {
        loop {
            match self.memory[self.ip] {
                3 => {
                    let value = self.input.remove(0);
                    self.write(1, value);
                    self.ip += 2;
                }
                4 => {
                    let value = self.read(1);
                    self.output.push(value);
                    self.ip += 2;
                }
                // ...
            }
        }
    }

The program now runs successfully and we can pass any two positive integers into the machine inputs to compute an exponent. Now, suppose we have an interactive program where want need to generate input by observing the output and responding to it. Our current design doesn’t let us make decisions like this, because the machine runs until it finds opcode 99, and we can’t inject code to monitor the output and use it to generate more input.

Whenever we want to insert arbitrary code into another process, closures seem like a good fit. What if, instead of Machine having two Vecs inside it, we could instead give it two closures – one of type FnMut() -> i64 that would produce inputs, and one of type FnMut(i64) to accept output. This would, for example, let us wrap an Iterator that returns the inputs, and a Vec to store the outputs, so we’re already accommodating different types for our input and output.

let program = [
    3, 25,         //  0
    3, 26,         //  2
    2, 24, 25, 24, //  4
    // ...
];

let mut input = [3, 7].iter().cloned();
let mut output = vec![];

let mut machine = Machine::new_with_io(
    &program,
    || input.next().unwrap(),
    |x| output.push(x)
);

machine.run();
println!("{:?}", output);

To make this work, we’d need to store the two closures inside Machine, so they’re available for Machine::run to use. The Machine::new function is modified to call a new Machine::new_with_io function, passing in two do-nothing closures for the existing cases where we don’t need I/O.

struct Machine<'a> {
    memory: Vec<i64>,
    ip: usize,
    input: Box<dyn 'a + FnMut() -> i64>,
    output: Box<dyn 'a + FnMut(i64)>,
}

impl<'a> Machine<'a> {
    fn new(program: &[i64]) -> Self {
        Self::new_with_io(program, || 0, |_| {})
    }

    fn new_with_io<I, O>(program: &[i64], input: I, output: O) -> Self
    where
        I: 'a + FnMut() -> i64,
        O: 'a + FnMut(i64),
    {
        Machine {
            memory: Vec::from(program),
            ip: 0,
            input: Box::new(input),
            output: Box::new(output),
        }
    }

    // ...
}

The Machine::run function could then use these closures when it encounters opcodes 3 and 4; rather than interacting with a Vec, it just calls into the closures the caller supplied.

    fn run(&mut self) {
        loop {
            match self.memory[self.ip] {
                3 => {
                    let value = (self.input)();
                    self.write(1, value);
                    self.ip += 2;
                }
                4 => {
                    let value = self.read(1);
                    (self.output)(value);
                    self.ip += 2;
                }
                // ...
            }
        }
    }

However, if we try this design, the borrow checker yells at us.

error[E0502]: cannot borrow `output` as immutable because it is
also borrowed as mutable
   |
   |         |x| output.push(x)
   |         --- ------ first borrow occurs due to use of `output` in closure
   |         |
   |         mutable borrow occurs here
...
   |     println!("{:?}", output);
   |                      ^^^^^^ immutable borrow occurs here
   | }
   | - mutable borrow might be used here, when `machine` is dropped
   |   and runs the destructor for type `Machine<'_>`

The problem is that, when we create a closure, we capture any references the closure body needs in order for the calls it makes to be valid. The output closure, |x| output.push(x), contains a call to Vec::push, which takes &mut self, and so the closure contains a &mut to the output variable. As the closure becomes part of the Machine struct, it lives until machine goes out of scope at the end of the program. While machine is in scope, it contains a &mut to output, and so no other interaction with output is possible, and we cannot print it.

let mut input = [3, 7].iter().cloned();
let mut output = vec![];

let mut machine = Machine::new_with_io(
    &program,
    || input.next().unwrap(),
    |x| output.push(x)            // mutable borrow
);

machine.run();
println!("{:?}", output);         // immutable borrow

// machine is still live here

We could solve this by reducing the lifetime of machine, by moving its use into a function or a local block, but this is an ugly and short-term solution that won’t serve us for long. The better approach, which Rust pushes us constantly toward, is to make our structs smaller and our mutable references narrower and shorter-lived. In this case, that means passing closures into Machine::run, where they’re needed, and not storing them in the Machine struct. That is, the Machine code now looks like this, with the struct and new() reverted to their original forms, but a new form of run() introduced to take the closures.

struct Machine {
    memory: Vec<i64>,
    ip: usize,
}

impl Machine {
    fn new(program: &[i64]) -> Self {
        let memory = Vec::from(program);
        Machine { memory, ip: 0 }
    }

    fn run(&mut self) {
        self.run_with_io(|| 0, |_| {});
    }

    fn run_with_io<I, O>(&mut self, mut input: I, mut output: O)
    where
        I: FnMut() -> i64,
        O: FnMut(i64),
    {
        loop {
            match self.memory[self.ip] {
                3 => {
                    let value = input();
                    self.write(1, value);
                    self.ip += 2;
                }
                4 => {
                    let value = self.read(1);
                    output(value);
                    self.ip += 2;
                }
                // ...
            }
        }
    }

    // ...
}

This design means the &mut for output only lives as long as the call to run_with_io(), so once it’s complete we’re free to read output again and we can print the program’s results.

let program = [
    3, 25,         //  0
    3, 26,         //  2
    2, 24, 25, 24, //  4
    // ...
];

let mut machine = Machine::new(&program);
let mut input = [3, 7].iter().cloned();
let mut output = vec![];

machine.run_with_io(
    || input.next().unwrap(),
    |x| output.push(x)
);

println!("{:?}", output);
// -> [2187]

Now, this particular machine does not read any more input after emitting its sole output value, but it’s clear that by introducing FnMut closures, we’re able to insert code to handle the output, and this will be able to affect the choice of the next input value we generate.

Already we’ve seen a guiding rule for managing read/write access in Rust: keep your structs small, and only scope references to the functions that need them. Making overly broad structs tends to expand both the scope and lifetime of mutable borrows, making it hard to interact with multiple parts of a structure at the same time. It really pushes towards a core principle of object-oriented design, forcing you to reduce structures to only those pieces of state that must live and be kept consistent together, and break big objects into smaller ones that express which pieces of state are truly related.

For our next challenge, let’s suppose we want to exponentiate a number multiple times, for example we’d like to compute ((23)3)3. The laws of algebra tell us this is 227 and we could use our existing machine here, but humour me and imagine that we implement this by chaining several machines together.

We already have a machine that calculates pow(x, y). Repeated exponentiation just means applying the same process to the output of such a machine: ((23)3)3 is given by pow(pow(pow(2, 3), 3), 3). We can take the output of one machine, and feed it in as input to another.

The pow() machine has very simple control flow: it reads two inputs, does some computation, and returns an output. So, we can run one instance of it to completion, then take its output and use that as input for another iteration. In the following program, we start with an output array containing the initial input: 2. Then we perform three loops where the previous output becomes the new input and we assign a fresh output array, we push a second value onto the input array to provide the second operand – the exponent 3 – and then we run a fresh copy of the program to generate the next output.

let mut input: Vec<i64>;
let mut output = vec![2];

for _ in 0..3 {
    input = output;
    output = vec![];

    let mut machine = Machine::new(&program);
    input.push(3);
    machine.run_with_io(|| input.remove(0), |x| output.push(x));
}

println!("{:?}", output);
// -> [134217728]

As you can verify for yourself, 227 is Indeed 134,217,728, so our chained machines appear to work. But, we’re already seeing a downside of using closures for I/O: we’re often using the same underlying types for managing input/output values, but we have to repeatedly tell Machine::run_with_io how to use them – we have to say || input.remove(0) or |x| output.push(x) rather than just passing our Vecs into run() and having them work transparently.

With Rust’s trait system we can do just that, and much more besides. To start, we need to change Machine::run_with_io so that instead of taking two closures, it takes one argument that implements a new trait called Input, and another that implements Output. These traits are parameterised but a Machine deals with i64 values so we’ll want to input and output those. The Input trait will have a get() function for fetching input, while Output has a put() function, and we use those in place of closure calls for opcodes 3 and 4.

    fn run_with_io<I, O>(&mut self, mut input: I, mut output: O)
    where
        I: Input<i64>,
        O: Output<i64>,
    {
        loop {
            match self.memory[self.ip] {
                3 => {
                    let value = input.get();
                    self.write(1, value);
                    self.ip += 2;
                }
                4 => {
                    let value = self.read(1);
                    output.put(value);
                    self.ip += 2;
                }
                // ...
            }
        }
    }

This is the only change needed to Machine itself and it opens up the ability to work with many more types – we just need implementations of Input and Output for all the types we want to use.

trait Input<T> {
    fn get(&mut self) -> T;
}

trait Output<T> {
    fn put(&mut self, value: T);
}

These traits take &mut self because in general, these functions will mutate the value they’re called on, for example by pushing a new value to a Vec or pulling the next value out of an Iterator inside a FnMut closure. Because Input::get and Output::put take &mut self, the input and output parameters to Machine::run_with_io need to be marked mut, but aside from that their types are completely opaque – any type can be passed here as long as it implements the right traits. If we’d instead said that output: &mut O where O: Output<i64> then we would only be able to pass &mut T types, rather than any type at all. For example, we’d need to prefix all closures we pass in with &mut, and some of the types we’ll use later can be mutated through a & reference rather than a &mut one. Therefore, we leave these parameter types as open-ended as possible.

To keep our existing examples working, we’ll need to implement Input<T> for closures of type FnMut() -> T, and Output<T> for closures of type FnMut(T).

impl<F, T> Input<T> for F
where
    F: FnMut() -> T,
{
    fn get(&mut self) -> T {
        self()
    }
}

impl<F, T> Output<T> for F
where
    F: FnMut(T),
{
    fn put(&mut self, value: T) {
        self(value);
    }
}

But now that we have these traits, we can start implementing them for other built-in types like Vec, thus allowing those types to be used directly with Machine::run_with_io. A lot of our examples would work if we could pass a mutable Vec reference in, so let’s write an implementation for &mut Vec<T>:

impl<T> Input<T> for &mut Vec<T> {
    fn get(&mut self) -> T {
        self.remove(0)
    }
}

impl<T> Output<T> for &mut Vec<T> {
    fn put(&mut self, value: T) {
        self.push(value);
    }
}

With these impls in place we can now pass &mut Vec<i64> in directly, without having to write closures that explain how to work with them:

let mut input: Vec<i64>;
let mut output = vec![2];

for _ in 0..3 {
    input = output;
    output = vec![];

    let mut machine = Machine::new(&program);
    input.push(3);
    machine.run_with_io(&mut input, &mut output);
}

println!("{:?}", output);
// -> [134217728]

We had to implement the traits for &mut Vec<T> rather than for Vec<T>, because of the signature of run_with_io:

    fn run_with_io<I, O>(&mut self, mut input: I, mut output: O)
    where
        I: Input<i64>,
        O: Output<i64>,

If it took &mut I and &mut O, then we could implement the traits for Vec<T> and then passing in &mut Vec<T> would work. But since it takes just I and O, the traits must be implemented for the exact type we pass in, which in this case is &mut Vec<T>. Remember, T and &mut T are distinct types, and implementing a trait for one does not imply it’s implemented for the other. Functions that rely on traits will often take a &T or &mut T where T implements the trait, but the range of types we want to want to work with means I’ve gone with accepting any type T, rather than restricting to types matching &mut T.

Here’s one more type these traits would be useful for. Often, as a process produces a single output, it would be nice to have that written directly to a single variable, rather than allocating a Vec for it. For example, our exponentiating machine could write its answer into a &mut i64:

let mut machine = Machine::new(&program);
let mut output = 0;

machine.run_with_io(&mut vec![3, 7], &mut output);

println!("{}", output);
// -> 2187

To make this work, we might be tempted to implement Output<T> for &mut T:

impl<T> Output<T> for &mut T {
    fn put(&mut self, value: T) {
        **self = value;
    }
}

However, this won’t compile as-is:

error[E0119]: conflicting implementations of trait `Output<_>` for type `&mut _`:
    |
    | / impl<F, T> Output<T> for F
    | | where
    | |     F: FnMut(T),
    | | {
...   |
    | |     }
    | | }
    | |_- first implementation here
...
    |   impl<T> Output<T> for &mut T {
    |   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ conflicting implementation for `&mut _`

Rust allows for some very dynamic use of its type system, but a constraint your code must obey is: for every function/method call in your program, there must be a single fn that that call dispatches to, and it must be unambiguously decidable at compile time which fn the compiler should pick. If we implement the same trait multiple times for the same type, this rule is broken, because the compiler won’t know which impl of Input<T> to pick when it sees input.get().

The error above says that that we can’t implement the trait for both &mut T and for F where F: FnMut(T), because there might be some type that matches both of these. Even if there isn’t one right now, there’s nothing to say one won’t be added to your program or to the Rust standard library in the future. And since we don’t want that to be a breaking change, you’re not allowed to write two impls that could match the same type.

Can we still do anything useful here? In this case we can, by being more specific: &mut i64 does not implement FnMut(i64), and furthermore the compiler guarantees that it never will; &mut T is a fundamental type, which means the compiler considers it a breaking change to implement any new traits for it.

impl Output<i64> for &mut i64 {
    fn put(&mut self, value: i64) {
        **self = value;
    }
}

We can implement Output<i64> for &mut i64 because Output<T> is defined in our crate. The Rust compiler won’t add new traits to &mut i64, so it will never match F where F: FnMut(T). And, other crates can’t write conflicting implementations because of the orphan rules, which roughly say an impl is not allowed if neither the implemented trait nor the type it’s being applied to is defined in the current crate.

The Fn traits are also fundamental, which means the compiler considers it a breaking change to implement them for any more types. So the conflict between implementations for &mut T and F where F: FnMut(T) must be because there’s an existing type matching both of them. And indeed there is: &mut F where F: FnMut(T)! FnMut<A> has a blanket implementation for &mut F where F: FnMut<A>, so &mut T and F where F: FnMut(A) can refer to the same type, if T: FnMut(A).

This notion that every function call and type reference must unambiguously resolve to a single item at compile time is known as coherence, and you are violating the coherence rules if the compiler tells you that you have conflicting implementations for a trait.

In part 2 of this article, we’ll look at some other useful types we can use here, when we need to deal with concurrency or multiple input/output sources. We’ll also see some patterns we can use to deal with coherence errors, and some new tricks that const generics will enable.