Generic returns in Rust

After casting around for a new platform to learn recently, I’ve decided to dive into Rust. Being mostly familiar with untyped languages like Ruby and JavaScript, it’s interesting to learn a statically typed language and see how it changes how one writes programs. There’s a common misconception amongst dynamic typing fans that static typing means you write the same programs, they’re just more verbose and come with more restrictions. And while there is certainly a cost to only being allowed to write type-safe programs, a good type system actually lets you write functions you cannot write in most dynamic languages. In Rust, generic return values are a good example of this.

Rust uses the Hindley-Milner type system most commonly associated with ML-family languages, most famously Haskell. It makes you write down the types of the parameters and return values of functions, but within function bodies it can infer the type of most things so you don’t need to write type annotations everywhere.

One exception to this is when you call a function with a generic return type. We’ll get to what this means in due course, but let’s start with an example. The first generic-returning function that the book introduces you to is Iterator::collect, which can convert an Iterator into any kind of collection. The following code creates an array of strings called words, and then maps each word to the pair (word, word.len()) containing the word and its length. The result of words.iter().map(...) is an Iterator whose items are pairs of strings and integers.

let words = ["losing", "my", "edge"];
let lengths = words.iter().map(|word| (word, word.len()));

An Iterator is a general construct for iterating over a sequence of values; it’s just an object with a next() function that returns the next element. Its collect() function lets you turn this sequence into any compatible data structure. For example, an iterator of pairs can be turned into a HashMap:

let map: HashMap<_, _> = lengths.collect();
// {"my": 2, "losing": 6, "edge": 4}

(The notation <_, _> means HashMap has two type parameters for its contents: the type of its keys and the type of its values. We can omit these and just write _ since Rust can infer them from the contents of the Iterator, but if you’re curious, the specific type is HashMap<&str, usize>.)

And, an iterator of any kind of value can be turned into a Vec, short for vector, which is a kind of dynamically-sized array:

let vec: Vec<_> = lengths.collect();
// [("losing", 6), ("my", 2), ("edge", 4)]

Coming from a dynamic languages background, this looks quite strange: we’re calling lengths.collect() but getting two different values out: the hashmap {"my": 2, "losing": 6, "edge": 4} or the vector [("losing", 6), ("my", 2), ("edge", 4)]. It seems as though lengths.collect() is behaving differently based on the type of the variable its result is assigned to.

How is this possible? You cannot write such a function in most untyped languages; each function’s behavior is determined by its inputs, which can include its arguments, the state of its host object in object-oriented languages, and possibly any global state either in the runtime or in external data sources. Functions do not have access to information about what the caller is expecting. Even in some typed languages this is not possible, for example collect() cannot be implemented in Java.

What makes this possible is that Rust’s type system uses the expected return type as part of the dispatch information, that is, the set of data that determines which version of collect() to invoke. In other languages that support polymorphism, you might be used to the idea that calling lengths.collect() examines the type of lengths, either at compile time or runtime, and invokes the implementation of collect() for that type. The difference in Rust is that the compiler also considers the requested return type, like HashMap<_, _> or Vec<_> above, when deciding which collect() implementation to invoke.

How is it actually implemented? How can we write our own collect()-like functions? Well, collect() itself is actually a little complicated, involving two other traits, FromIterator and IntoIterator, which have slightly complicated definitions. A better place to start is the much simpler trait, Into.

The Into trait also defines a function with a generic return, but it’s much simpler than Iterator. It defines an interface for converting a value from one type to another. Here is its definition:

trait Into<T> {
    fn into(self) -> T;
}

The <T> following Into means T is a type variable or type parameter, rather than a specific type. Traits and functions with type variables are said to be generic, and Into::into has a generic return type.

Usually, all function parameters need to be given a type. self is a special parameter that means the function can be called as a method, i.e. as value.into() rather than into(value). self automatically has the type Self, which refers to the type the function is defined in.

Here, the entire trait Into<T> is generic, rather than a specific function. A type that implements Into<T> must have a function called into() that takes self, and returns the type T. There are no constraints on this definition, so T can be any type at all. Importantly, a type can implement Into with multiple different types standing in for the variable T, for example a type might implement both Into<String> and Into<Vec<_>>.

This is where Rust differs from a language like Java: it is possible to define the Into interface in Java:

interface Into<T> {
    public T into();
}

However, each class is only allowed to implement Into once; it is a compile error to open a class with implements Into<String>, Into<List>. In Rust, the equivalent is not just legal but very common. For example let’s say we have a struct Person with two fields, name is a string and age is a 8-bit unsigned integer (u8):

#[derive(Clone)]
struct Person {
    name: String,
    age: u8,
}

The annotation #[derive(Clone)] means this struct gains a function called clone() that lets us copy instances of it, which is necessary in some examples below because of how memory management works in Rust. We might want to turn this struct into a string, and one way to support this is to implement Into<String> for the Person type.

impl Into<String> for Person {
    fn into(self) -> String {
        format!("{} ({})", self.name, self.age)
    }
}

Here, self implicitly has the type Person – the type we are defining into() for. We could also convert a Person into a pair of the person’s name and age, which we can do by implementing Into<(String, u8)>:

impl Into<(String, u8)> for Person {
    fn into(self) -> (String, u8) {
        (self.name, self.age)
    }
}

We now have two implementations of the function definition Into::into. To give them their fully-qualified names, which refer unambiguously to a single function implementation, they are <Person as Into<String>>::into, and <Person as Into<(String, u8)>>::into. In general, the notation <X as Y>::foo refers to the function foo defined inside the block impl Y for X, the implementation of trait Y for type X.

With these definitions, it’s now possible to call into() on Person values with different expected types, and get different return values:

fn main() {
    let me = Person { name: String::from("James"), age: 35 };
    let s: String = me.clone().into();
    let t: (_, _) = me.clone().into();

    println!("s = {:?}, t = {:?}", s, t);
    // -> s = "James (35)", t = ("James", 35)
}

These examples are contrived, but they demonstrate the basic idea: one way to implement a function with a generic return is to declare its type in a trait, and then provide multiple implementations of that trait for different types. The compiler uses type inference to determine which implementation you need for each call.

This also works when the result of a generic call is passed to another function – the return type is inferred from the latter function’s parameter type. For example, if print_person takes a String:

fn print_person(person: String) {
    println!("person = {}", person);
}

Then calling print_person(me.into()) will automatically invoke the version of into() that turns a Person into a String.

Very often, Rust can infer which type or function you want from context. Method calls are syntactic sugar for calling the named function with the receiver as its first argument. For example, me.into() is interpreted by the compiler by observing that me is a Person, we’re calling a function called into(), and we want this expression to return a String, and there is one function implementation that meets all these criteria: <Person as Into<String>>::into. So this method call desugars to:

let s = <Person as Into<String>>::into(me);

Depending on how much Rust can infer from context, you are allowed to omit parts of the fully-qualified name as long as Rust can still unambiguously decide which function you meant. For example, because it knows me is of type Person, the above can also be written:

let s = Into::<String>::into(me);

Note the :: in Into::<String> here. ::<> is known as the turbofish operator and it used to specify a variant of a generic function, type or trait, by supplying values for its type parameters. If the type of s is known, this operation can also be omitted:

let s: String = Into::into(me);

Conversely, if the type Person only has a single implementation of into() that returns the desired type, we can also write:

let s: String = Person::into(me);

But since me is known to be a Person this reduces to let s: String = me.into(), which is what we started with.

So now we’ve seen one technique for implementing generic returns. We can declare its type in a trait and provide multiple implementations for it. We know how to refer to each specific implementation, and we know we can omit certain parts of the full name if Rust can infer them. This is a lot for such a simple trait, but this will help you understand more complicated types.

However, this technique is not how Iterator::collect is implemented. Iterator is a trait, just like Into – it’s an interface that many types implement. Traits can have required functions, like Into::into above: the trait just declares the function’s type, and each implementation of the trait provides a different implementation of its functions. Iterator has one required function, next() and one required type, Item. The next() function returns the next value in the sequence, and Item declares what the type of these values is. So Iterator is defined as:

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

Iterator::next returning Option<Self::Item> means that it returns an Option containing whatever type Item refers to for the implementing type. For example, if we have a Vec<u8> like vec![1, 2, 3], then the iterator for this value yields Option<u8> values:

let v = vec![1, 2, 3].into_iter();

v.next() // -> Some(1)
v.next() // -> Some(2)
v.next() // -> Some(3)
v.next() // -> None

As well as requiring functions, a trait can provide functions – it can implement those functions itself rather than just declaring their types. These functions can make use of the required functions, as any type implementing the trait will provide implementations for them. Iterator::collect is one such function: it’s implemented directly by the Iterator trait, rather than by each type that implements Iterator. Its declaration looks something like:

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;

    // other provided functions elided

    fn collect<B>(self) -> B
    where
        B: FromIterator<Self::Item>
    {
        // implementation code
    }
}

For Into<T> we implemented the function into(self) -> T by writing multiple separate implementations of the function for each value of T. Here, Iterator::collect directly implements a function returning the generic type B – the <B> after the name collect means that B is a type variable just for this function. How can we possibly write a function body that generates a value when we’ve no idea what type it should have?

The clue is in the trait bound – the constraint following the word where, saying that B: FromIterator<Self::Item>. That means B can’t be absolutely any type, it can only be a type that implements the trait FromIterator<Self::Item>. To understand the significance of this, let’s go back to the Into trait.

We said earlier that Into<T> doesn’t put any constraints on T, so types can implement into() to return anything they want. Each specific impl of Into<T> provides a possible return type and Rust figures out which version to call through type inference. But typically, types don’t implement Into directly. Instead, they implement the related trait From, and that makes Into work for free.

Here’s the definition of From. Rather than having a generic return type on its from() function, it has a generic parameter. Its return type is Self, which means each type that implements From<T> should have a from() function that returns that type.

trait From<T> {
    fn from(T) -> Self;
}

Instead of implementing Into<T> for Person, we can implement From<Person> for T; rather than impl Into<String> for Person, we have impl From<Person> for String. Since we’re implementing From<Person>, the input type to the from() function is Person.

impl From<Person> for String {
    fn from(p: Person) -> String {
        format!("{} ({})", p.name, p.age)
    }
}

impl From<Person> for (String, u8) {
    fn from(p: Person) -> (String, u8) {
        (p.name, p.age)
    }
}

Given just these implementations, we can still call me.into() to convert our Person value into something else. This is because Into<T> has a blanket implementation for any type U, if T implements From<U>.

impl<T, U> Into<T> for U
where
    T: From<U>
{
    fn into(self) -> T {
        T::from(self)
    }
}

Let’s break this down. The <T, U> after impl means that T and U are type variables, not concrete types, in the following definition. This block implements Into<T> for any type U, on the condition that there’s an implementation of From<U> for T. That is, if we’ve implemented From<U> for T, then we have defined a way to turn a U into a T, and so the implementation of Into<T> for U is trivial: we just have to call T::from with our U value. Into::into and From::from are just different ways of getting the same effect.

Now notice the body of the from() function here:

    fn into(self) -> T {
        T::from(self)
    }

The parameter self always has type Self, which refers to the type this impl block is for, which in this case is U. So Rust can fill in that type information:

    fn into(self: U) -> T {
        T::from(self)
    }

The fully-qualified name of the from() function we want to call here is the implementation of From::<U>::from for the type T, which we saw above is written <T as From<U>>::from.

    fn into(self: U) -> T {
        <T as From<U>>::from(self)
    }

The as From<U> part is redundant because we know from the trait bounds on this impl block that T implements From<U>. In fact that’s all we know about type T at compile time here, so any functions we call on T must come from its implementation of From<U>. Therefore we can write T::from(self) and this unambiguously identifies a single from() function for each pair of types T and U.

So this is our second pattern for implementing a generic-returning function: we can implement it directly by requiring that the return type satisfies a trait constraint, and then just delegating to that trait. In the case of Into::into, we require that the return type in the function fn(U) -> T satisfies T: From<U>, and then delegate to T::from to produce the desired result.

Let’s examine the type of Iterator::collect again.

    fn collect<B>(self) -> B
    where
        B: FromIterator<Self::Item>

Here B is constrained to be a type that implements FromIterator<A> with A = Self::Item. Here is the definition of FromIterator:

trait FromIterator<A> {
    fn from_iter<T>(iter: T) -> Self
    where
        T: IntoIterator<Item = A>;
}

FromIterator<A> has a type parameter A, which means a single type can implement it multiple times with different values for A. It has one required function: from_iter() takes any type T that implements IntoIterator<Item = A>, and returns Self. So in Iterator::collect, here is what we know:

  • collect is defined in the Iterator trait, so Self implements Iterator
  • its return type B implements FromIterator<Self::Item>
  • the only required function in FromIterator<A> is from_iter()

Given what we know about B, the only possible thing we can call in the body of Iterator::collect is the single required function from the single trait that B implements: from_iter(). Just as in Into::into, we can write B::from_iter rather than the full name <B as FromIterator<Self::Item>>::from_iter.

    fn collect<B>(self) -> B
    where
        B: FromIterator<Self::Item>
    {
        B::from_iter(self)
    }

Does this type-check? Here, self has some type that implements Iterator, and from_iter() takes some type T that implements IntoIterator. IntoIterator is a trait with a single function into_iter(), which converts some type into an Iterator; defining this conversion lets you use the type as the input to a for-loop. The Iterator trait implements this trivially by returning itself:

impl<I> IntoIterator for I
where
    I: Iterator
{
    // some types elided

    fn into_iter(self) -> I {
        self
    }
}

This is a special case of the pattern we saw in impl<T, U> Into<T> for U where T: From<U>: IntoIterator has a blanket implementation for any Iterator type, but we don’t need to call anything to convert the input into an Iterator since it already is one. The blanket implementation means that any type I that implements Iterator also implements IntoIterator. Therefore, self in Iterator::collect satisfies the input requirement for FromIterator::from_iter, and since from_iter() returns Self, B::from_iter returns something of type B, as required by the signature of collect().

We’ve seen that Iterator::collect follows the pattern of the blanket implementation for Into::into: it places a trait bound on its return type, and calls that trait to perform the conversion, passing self as the input.

One final detail remains to be understood: the type parameters used in Iterator, FromIterator and IntoIterator. Here are all the relevant declarations again:

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;

    fn collect<B>(self) -> B
    where
        B: FromIterator<Self::Item>
    {
        B::from_iter(self)
    }
}

trait FromIterator<A> {
    fn from_iter<T>(iter: T) -> Self
    where
        T: IntoIterator<Item = A>;
}

trait IntoIterator
where
    <Self::IntoIter as Iterator>::Item == Self::Item
{
    type Item;
    type IntoIter: Iterator;
    fn into_iter(self) -> Self::IntoIter;
}

There are some interesting constraints at work here, especially in IntoIterator. To see why they’re necessary, let’s imagine implementing these traits for one of Rust’s built-in data structures, Vec. These implementations aren’t how Rust actually does them, but they’re enough to illustrate how these traits use the type system.

First, let’s invent an iterator for vectors. This will use a struct called VecIter which holds an owned Vec, and implements Iterator by shifting items off the front of the vector until it’s empty.

struct VecIter<T> {
    vec: Vec<T>,
}

impl<T> Iterator for VecIter<T> {
    type Item = T;

    fn next(&mut self) -> Option<Self::Item> {
        if self.vec.is_empty() {
            None
        } else {
            Some(self.vec.remove(0))
        }
    }
}

Iterator has an associated type Item, rather than being generic. That means that, unlike Into<T>, we can only implement Iterator once for each type. The Item type states what type of values this iterator produces: a Vec<u8> will produce u8 values, a Vec<String> will produce strings, and in general a Vec<T> will yield values of type T. Vec::<T>::remove returns a T, so returning Some(self.vec.remove(0)) means we’re returning an Option<T>, which satisfies the Iterator requirement of returning Option<Self::Item>. Self::Item refers to the Item type in the current impl block, or to give it its fully-qualified name, <VecIter<T> as Iterator>::Item.

Implementing Iterator on VecIter means that it implicitly implements IntoIterator, and can therefore be passed to a for-loop. To use a Vec in the same way, we need to implement IntoIterator in such a way as to convert a Vec into a VecIter. Like Iterator, IntoIterator can only be implemented once per type, and it has associated types to tell the compiler what specific types this conversion produces: IntoIter says which concrete type is returned from into_iter(), and this type must implement Iterator. Item says what kind of values this iterator produces.

impl<T> IntoIterator for Vec<T> {
    type Item = T;
    type IntoIter = VecIter<T>;

    fn into_iter(self) -> Self::IntoIter {
        VecIter { vec: self }
    }
}

Recall that the IntoIterator trait has the following constraint on its implementation:

    <Self::IntoIter as Iterator>::Item == Self::Item

Does our implementation for Vec<T> satisfy this? Well, Self is the type we’re currently implementing: Vec<T> as IntoIterator. That gives us:

    Self::IntoIter = VecIter<T>
    Self::Item = T

Plugging these into the constraint gives:

    <VecIter<T> as Iterator>::Item == T

This means the Item type from the impl Iterator for VecIter<T> block must equal T, which indeed it does. But why do we need this constraint? It seems to say something quite boring: when you implement IntoIterator, the iterator you return should produce items of the type you’ve declared for IntoIterator::Item. But why declare this? Isn’t it enough to say that this implementation returns a VecIter<T>? IntoIterator doesn’t provide any of its own functions so there’s nowhere that it refers to Self::Item in its interface.

The answer is that it lets us put a constraint on FromIterator implementations to make sure they’re valid. Consider the following implementation, which satisfies the trait definition but doesn’t do anything but return an empty vector. The <A> after impl means this implements FromIterator<A> for any type A, rather than for some specific type like when we implemented From<Person>.

impl<A> FromIterator<A> for Vec<A> {
    fn from_iter<T>(iter: T) -> Self {
        let vector = Vec::new();
        vector
    }
}

This allows the iter input to have absolutely any type T. Its type doesn’t matter, because iter is not actually used in the function body. If we try to iterate over it, we get an error because T does not implement Iterator:

impl<A> FromIterator<A> for Vec<A> {
    fn from_iter<T>(iter: T) -> Self {
        let vector = Vec::new();
        for item in iter {}
        vector
    }
}

/* compile error:

error[E0277]: `T` is not an iterator
   |
   |         for item in iter {}
   |                     ^^^^ `T` is not an iterator
   |
   = help: the trait `std::iter::Iterator` is not implemented for `T`
   = help: consider adding a `where T: std::iter::Iterator` bound
   = note: required by `std::iter::IntoIterator::into_iter`
*/

To be allowed to iterate over T, it must implement IntoIterator:

impl<A> FromIterator<A> for Vec<A> {
    fn from_iter<T>(iter: T) -> Self
    where
        T: IntoIterator
    {
        let vector = Vec::new();
        for item in iter {}
        vector
    }
}

Declaring that T: IntoIterator means we can iterate over it, but we can’t do anything interesting with its contents. To produce a vector from the input, we probably want to add each item to the vector before returning it:

impl<A> FromIterator<A> for Vec<A> {
    fn from_iter<T>(iter: T) -> Self
    where
        T: IntoIterator
    {
        let mut vector = Vec::new();
        for item in iter {
            vector.push(item);
        }
        vector
    }
}

/* compile error:

error[E0308]: mismatched types
   |
   |         vector
   |         ^^^^^^ expected type parameter, found associated type
   |
   = note: expected type `std::vec::Vec<A>`
              found type `std::vec::Vec<<T as std::iter::IntoIterator>::Item>`
*/

Now we have a new problem. We’ve stated that from_iter() returns Self, or Vec<A>, so its last expression must have type Vec<A>. But the inferred type of vector is Vec<<T as IntoIterator>::Item>: because we’re pushing the contents of iter into it, the type of the values is the Item type from the implementation of IntoIterator for T. There’s no proof that this matches the type A in the expected return type Vec<A>, so we get a compile error.

All it takes to resolve this is to declare that these types are in fact equal, by constraining T to implement IntoIterator<Item = A>:

impl<A> FromIterator<A> for Vec<A> {
    fn from_iter<T>(iter: T) -> Self
    where
        T: IntoIterator<Item = A>
    {
        let mut vector = Vec::new();
        for item in iter {
            vector.push(item);
        }
        vector
    }
}

This will now compile, as Rust knows that the iterator that T is converted into will produce A values. The IntoIterator::Item type lets us state this constraint, and the constraint on IntoIterator makes sure this Item type is the same as that in the resulting Iterator. Without this constraint, we’d have to write something like this for the type of from_iter():

    fn from_iter<T, U>(iter: T) -> Self
    where
        T: IntoIterator<IntoIter = U>,
        <U as Iterator>::Item == A

So having the Item type in IntoIterator and constraining it to match Item in the Iterator returned by into_iter() means that other functions can rely on IntoIterator<Item = A> rather than the verbose declaration above. Of course, this could be made simpler by having FromIterator depend directly on Iterator, but depending on IntoIterator allows from_iter() to accept a wider range of values without all its callers having to call into_iter() on the input themselves.

We now understand what the generic trait FromIterator means:

trait FromIterator<A> {
    fn from_iter<T>(iter: T) -> Self
    where
        T: IntoIterator<Item = A>;
}

Implementing FromIterator<A> means that a type can be generated from an iterator of A, or anything that implements IntoIterator<Item = A>, and can therefore be created by calling collect() on such an iterator. Iterator::collect relies on FromIterator<Self::Item>, so it can return any type that implements either a generic FromIterator<A>, or a specific FromIterator for the kind of item that iterator holds. For example, vectors can hold any kind of value, so Vec<T> implements FromIterator<T> with no further constraints, but hashmaps can only hold pairs of type (K, V) where K can be hashed and compared for equality, and so it implements FromIterator as:

impl<K, V> FromIterator<(K, V)> for HashMap<K, V>
where
    K: Eq + Hash

Therefore, we can turn any iterator into a Vec, but only some kinds of iterators into a HashMap. Recall our original example for hashmaps:

let words = ["losing", "my", "edge"];
let lengths = words.iter().map(|word| (word, word.len()));

let map: HashMap<_, _> = lengths.collect();

This compiles and runs as we saw, but attempting to turn an array of strings into a hashmap is a compile-time error:

let words = ["losing", "my", "edge"];
let map: HashMap<_, _> = words.iter().collect();

/* compile error:

error[E0277]: a collection of type `std::collections::HashMap<_, _>` cannot be
              built from an iterator over elements of type `&&str`
   |
   |     let map: HashMap<_, _> = words.iter().collect();
   |                                           ^^^^^^^
   |          a collection of type `std::collections::HashMap<_, _>`
   |          cannot be built from `std::iter::Iterator<Item=&&str>`
   |
   = help: the trait `std::iter::FromIterator<&&str>` is
           not implemented for `std::collections::HashMap<_, _>`
*/

HashMap<K, V> only implements FromIterator<(K, V)>, not the catch-all FromIterator<A>, and the type &&str is not a variation on the pair type (K, V). The compiler can’t find any application conversion, and so the program does not compile.

Some final remarks. FromIterator is generic, meaning a single type can implement it multiple times as we did for From – we could implement FromIterator<u8> and FromIterator<String> for the same type. IntoIterator is not generic, and can only be implemented once. So, each type has one way to convert itself into an iterator, but can have multiple ways to generate itself from an iterator.

Also, although FromIterator relies on IntoIterator, they are not equivalent in the same way that From and Into are. Defining From<U> for T implicitly defines Into<T> for U, since both these convert a U into a T. But converting something into an iterator, and converting iterators into other types, are not equivalent, so implementing IntoIterator doesn’t give you FromIterator for free; you must implement both.

We have seen that Rust’s trait system allows for generic-returning functions like collect() to have fine-grained constraints on the conversions they are allowed to perform, while retaining a great deal of expressiveness as long as the type system can identify a unique implementation to invoke. They let us place the complexity in the definitions of functions, rather than at their call sites, so that changing a function or trait definition does not necessarily mean having to update loads of calls to that function, side-stepping some of the maintenance headaches usually associated with typed languages.