After casting around for a new platform to learn recently, I’ve decided to dive into Rust. Being mostly familiar with untyped languages like Ruby and JavaScript, it’s interesting to learn a statically typed language and see how it changes how one writes programs. There’s a common misconception amongst dynamic typing fans that static typing means you write the same programs, they’re just more verbose and come with more restrictions. And while there is certainly a cost to only being allowed to write type-safe programs, a good type system actually lets you write functions you cannot write in most dynamic languages. In Rust, generic return values are a good example of this.
Rust uses the Hindley-Milner type system most commonly associated with ML-family languages, most famously Haskell. It makes you write down the types of the parameters and return values of functions, but within function bodies it can infer the type of most things so you don’t need to write type annotations everywhere.
One exception to this is when you call a function with a generic return type.
We’ll get to what this means in due course, but let’s start with an example. The
first generic-returning function that the book introduces you to is
Iterator::collect, which can convert an Iterator into any kind of
collection. The following code creates an array of strings called words, and
then maps each word to the pair (word, word.len()) containing the word and its
length. The result of words.iter().map(...) is an Iterator whose items
are pairs of strings and integers.
let words = ["losing", "my", "edge"];
let lengths = words.iter().map(|word| (word, word.len()));
An Iterator is a general construct for iterating over a sequence of values;
it’s just an object with a next() function that returns the next element. Its
collect() function lets you turn this sequence into any compatible data
structure. For example, an iterator of pairs can be turned into a
HashMap:
let map: HashMap<_, _> = lengths.collect();
// {"my": 2, "losing": 6, "edge": 4}
(The notation <_, _> means HashMap has two type parameters for its contents:
the type of its keys and the type of its values. We can omit these and just
write _ since Rust can infer them from the contents of the Iterator, but if
you’re curious, the specific type is HashMap<&str, usize>.)
And, an iterator of any kind of value can be turned into a Vec, short for
vector, which is a kind of dynamically-sized array:
let vec: Vec<_> = lengths.collect();
// [("losing", 6), ("my", 2), ("edge", 4)]
Coming from a dynamic languages background, this looks quite strange: we’re
calling lengths.collect() but getting two different values out: the hashmap
{"my": 2, "losing": 6, "edge": 4} or the vector [("losing", 6), ("my", 2),
("edge", 4)]. It seems as though lengths.collect() is behaving differently
based on the type of the variable its result is assigned to.
How is this possible? You cannot write such a function in most untyped
languages; each function’s behavior is determined by its inputs, which can
include its arguments, the state of its host object in object-oriented
languages, and possibly any global state either in the runtime or in external
data sources. Functions do not have access to information about what the caller
is expecting. Even in some typed languages this is not possible, for example
collect() cannot be implemented in Java.
What makes this possible is that Rust’s type system uses the expected return
type as part of the dispatch information, that is, the set of data that
determines which version of collect() to invoke. In other languages that
support polymorphism, you might be used to the idea that calling
lengths.collect() examines the type of lengths, either at compile time or
runtime, and invokes the implementation of collect() for that type. The
difference in Rust is that the compiler also considers the requested return
type, like HashMap<_, _> or Vec<_> above, when deciding which collect()
implementation to invoke.
How is it actually implemented? How can we write our own collect()-like
functions? Well, collect() itself is actually a little complicated, involving
two other traits, FromIterator and IntoIterator, which have slightly
complicated definitions. A better place to start is the much simpler trait,
Into.
The Into trait also defines a function with a generic return, but it’s
much simpler than Iterator. It defines an interface for converting a value
from one type to another. Here is its definition:
trait Into<T> {
fn into(self) -> T;
}
The <T> following Into means T is a type variable or type parameter,
rather than a specific type. Traits and functions with type variables are said
to be generic, and Into::into has a generic return type.
Usually, all function parameters need to be given a type. self is a special
parameter that means the function can be called as a method, i.e. as
value.into() rather than into(value). self automatically has the type
Self, which refers to the type the function is defined in.
Here, the entire trait Into<T> is generic, rather than a specific function. A
type that implements Into<T> must have a function called into() that takes
self, and returns the type T. There are no constraints on this definition,
so T can be any type at all. Importantly, a type can implement Into with
multiple different types standing in for the variable T, for example a type
might implement both Into<String> and Into<Vec<_>>.
This is where Rust differs from a language like Java: it is possible to define
the Into interface in Java:
interface Into<T> {
public T into();
}
However, each class is only allowed to implement Into once; it is a compile
error to open a class with implements Into<String>, Into<List>. In Rust, the
equivalent is not just legal but very common. For example let’s say we have a
struct Person with two fields, name is a string and age is a 8-bit
unsigned integer (u8):
#[derive(Clone)]
struct Person {
name: String,
age: u8,
}
The annotation #[derive(Clone)] means this struct gains a function called
clone() that lets us copy instances of it, which is necessary in some examples
below because of how memory management works in Rust. We might want to turn this
struct into a string, and one way to support this is to implement Into<String>
for the Person type.
impl Into<String> for Person {
fn into(self) -> String {
format!("{} ({})", self.name, self.age)
}
}
Here, self implicitly has the type Person – the type we are defining
into() for. We could also convert a Person into a pair of the person’s name
and age, which we can do by implementing Into<(String, u8)>:
impl Into<(String, u8)> for Person {
fn into(self) -> (String, u8) {
(self.name, self.age)
}
}
We now have two implementations of the function definition Into::into. To give
them their fully-qualified names, which refer unambiguously to a single
function implementation, they are <Person as Into<String>>::into, and <Person
as Into<(String, u8)>>::into. In general, the notation <X as Y>::foo refers
to the function foo defined inside the block impl Y for X, the
implementation of trait Y for type X.
With these definitions, it’s now possible to call into() on Person values
with different expected types, and get different return values:
fn main() {
let me = Person { name: String::from("James"), age: 35 };
let s: String = me.clone().into();
let t: (_, _) = me.clone().into();
println!("s = {:?}, t = {:?}", s, t);
// -> s = "James (35)", t = ("James", 35)
}
These examples are contrived, but they demonstrate the basic idea: one way to implement a function with a generic return is to declare its type in a trait, and then provide multiple implementations of that trait for different types. The compiler uses type inference to determine which implementation you need for each call.
This also works when the result of a generic call is passed to another function
– the return type is inferred from the latter function’s parameter type. For
example, if print_person takes a String:
fn print_person(person: String) {
println!("person = {}", person);
}
Then calling print_person(me.into()) will automatically invoke the version of
into() that turns a Person into a String.
Very often, Rust can infer which type or function you want from context. Method
calls are syntactic sugar for calling the named function with the receiver as
its first argument. For example, me.into() is interpreted by the compiler by
observing that me is a Person, we’re calling a function called into(), and
we want this expression to return a String, and there is one function
implementation that meets all these criteria: <Person as
Into<String>>::into. So this method call desugars to:
let s = <Person as Into<String>>::into(me);
Depending on how much Rust can infer from context, you are allowed to omit parts
of the fully-qualified name as long as Rust can still unambiguously decide which
function you meant. For example, because it knows me is of type Person, the
above can also be written:
let s = Into::<String>::into(me);
Note the :: in Into::<String> here. ::<> is known as the turbofish
operator and it used to specify a variant of a generic function, type or trait,
by supplying values for its type parameters. If the type of s is known, this
operation can also be omitted:
let s: String = Into::into(me);
Conversely, if the type Person only has a single implementation of into()
that returns the desired type, we can also write:
let s: String = Person::into(me);
But since me is known to be a Person this reduces to let s: String =
me.into(), which is what we started with.
So now we’ve seen one technique for implementing generic returns. We can declare its type in a trait and provide multiple implementations for it. We know how to refer to each specific implementation, and we know we can omit certain parts of the full name if Rust can infer them. This is a lot for such a simple trait, but this will help you understand more complicated types.
However, this technique is not how Iterator::collect is implemented.
Iterator is a trait, just like Into – it’s an interface that many types
implement. Traits can have required functions, like Into::into above: the
trait just declares the function’s type, and each implementation of the trait
provides a different implementation of its functions. Iterator has one
required function, next() and one required type, Item. The next() function
returns the next value in the sequence, and Item declares what the type of
these values is. So Iterator is defined as:
trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
}
Iterator::next returning Option<Self::Item> means that it returns an
Option containing whatever type Item refers to for the implementing type.
For example, if we have a Vec<u8> like vec![1, 2, 3], then the iterator for
this value yields Option<u8> values:
let v = vec![1, 2, 3].into_iter();
v.next() // -> Some(1)
v.next() // -> Some(2)
v.next() // -> Some(3)
v.next() // -> None
As well as requiring functions, a trait can provide functions – it can
implement those functions itself rather than just declaring their types. These
functions can make use of the required functions, as any type implementing the
trait will provide implementations for them. Iterator::collect is one such
function: it’s implemented directly by the Iterator trait, rather than by each
type that implements Iterator. Its declaration looks something like:
trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
// other provided functions elided
fn collect<B>(self) -> B
where
B: FromIterator<Self::Item>
{
// implementation code
}
}
For Into<T> we implemented the function into(self) -> T by writing multiple
separate implementations of the function for each value of T. Here,
Iterator::collect directly implements a function returning the generic type
B – the <B> after the name collect means that B is a type variable just
for this function. How can we possibly write a function body that generates a
value when we’ve no idea what type it should have?
The clue is in the trait bound – the constraint following the word where,
saying that B: FromIterator<Self::Item>. That means B can’t be absolutely
any type, it can only be a type that implements the trait
FromIterator<Self::Item>. To understand the significance of this, let’s go
back to the Into trait.
We said earlier that Into<T> doesn’t put any constraints on T, so types can
implement into() to return anything they want. Each specific impl of
Into<T> provides a possible return type and Rust figures out which version to
call through type inference. But typically, types don’t implement Into
directly. Instead, they implement the related trait From, and that makes
Into work for free.
Here’s the definition of From. Rather than having a generic return type on its
from() function, it has a generic parameter. Its return type is Self, which
means each type that implements From<T> should have a from() function that
returns that type.
trait From<T> {
fn from(T) -> Self;
}
Instead of implementing Into<T> for Person, we can implement From<Person>
for T; rather than impl Into<String> for Person, we have impl From<Person>
for String. Since we’re implementing From<Person>, the input type to the
from() function is Person.
impl From<Person> for String {
fn from(p: Person) -> String {
format!("{} ({})", p.name, p.age)
}
}
impl From<Person> for (String, u8) {
fn from(p: Person) -> (String, u8) {
(p.name, p.age)
}
}
Given just these implementations, we can still call me.into() to convert our
Person value into something else. This is because Into<T> has a blanket
implementation for any type U, if T implements From<U>.
impl<T, U> Into<T> for U
where
T: From<U>
{
fn into(self) -> T {
T::from(self)
}
}
Let’s break this down. The <T, U> after impl means that T and U are type
variables, not concrete types, in the following definition. This block
implements Into<T> for any type U, on the condition that there’s an
implementation of From<U> for T. That is, if we’ve implemented From<U> for
T, then we have defined a way to turn a U into a T, and so the
implementation of Into<T> for U is trivial: we just have to call T::from
with our U value. Into::into and From::from are just different ways of
getting the same effect.
Now notice the body of the from() function here:
fn into(self) -> T {
T::from(self)
}
The parameter self always has type Self, which refers to the type this
impl block is for, which in this case is U. So Rust can fill in that type
information:
fn into(self: U) -> T {
T::from(self)
}
The fully-qualified name of the from() function we want to call here is the
implementation of From::<U>::from for the type T, which we saw above is written
<T as From<U>>::from.
fn into(self: U) -> T {
<T as From<U>>::from(self)
}
The as From<U> part is redundant because we know from the trait bounds on this
impl block that T implements From<U>. In fact that’s all we know about
type T at compile time here, so any functions we call on T must come from
its implementation of From<U>. Therefore we can write T::from(self) and this
unambiguously identifies a single from() function for each pair of types T
and U.
So this is our second pattern for implementing a generic-returning function: we
can implement it directly by requiring that the return type satisfies a trait
constraint, and then just delegating to that trait. In the case of Into::into,
we require that the return type in the function fn(U) -> T satisfies T:
From<U>, and then delegate to T::from to produce the desired result.
Let’s examine the type of Iterator::collect again.
fn collect<B>(self) -> B
where
B: FromIterator<Self::Item>
Here B is constrained to be a type that implements FromIterator<A>
with A = Self::Item. Here is the definition of FromIterator:
trait FromIterator<A> {
fn from_iter<T>(iter: T) -> Self
where
T: IntoIterator<Item = A>;
}
FromIterator<A> has a type parameter A, which means a single type can
implement it multiple times with different values for A. It has one required
function: from_iter() takes any type T that implements IntoIterator<Item =
A>, and returns Self. So in Iterator::collect, here is what we know:
collectis defined in theIteratortrait, soSelfimplementsIterator- its return type
BimplementsFromIterator<Self::Item> - the only required function in
FromIterator<A>isfrom_iter()
Given what we know about B, the only possible thing we can call in the body of
Iterator::collect is the single required function from the single trait that
B implements: from_iter(). Just as in Into::into, we can write
B::from_iter rather than the full name <B as
FromIterator<Self::Item>>::from_iter.
fn collect<B>(self) -> B
where
B: FromIterator<Self::Item>
{
B::from_iter(self)
}
Does this type-check? Here, self has some type that implements Iterator, and
from_iter() takes some type T that implements IntoIterator.
IntoIterator is a trait with a single function into_iter(), which converts
some type into an Iterator; defining this conversion lets you use the type as
the input to a for-loop. The Iterator trait implements this trivially by
returning itself:
impl<I> IntoIterator for I
where
I: Iterator
{
// some types elided
fn into_iter(self) -> I {
self
}
}
This is a special case of the pattern we saw in impl<T, U> Into<T> for U where
T: From<U>: IntoIterator has a blanket implementation for any Iterator
type, but we don’t need to call anything to convert the input into an Iterator
since it already is one. The blanket implementation means that any type I that
implements Iterator also implements IntoIterator. Therefore, self in
Iterator::collect satisfies the input requirement for
FromIterator::from_iter, and since from_iter() returns Self,
B::from_iter returns something of type B, as required by the signature of
collect().
We’ve seen that Iterator::collect follows the pattern of the blanket
implementation for Into::into: it places a trait bound on its return type, and
calls that trait to perform the conversion, passing self as the input.
One final detail remains to be understood: the type parameters used in
Iterator, FromIterator and IntoIterator. Here are all the relevant
declarations again:
trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
fn collect<B>(self) -> B
where
B: FromIterator<Self::Item>
{
B::from_iter(self)
}
}
trait FromIterator<A> {
fn from_iter<T>(iter: T) -> Self
where
T: IntoIterator<Item = A>;
}
trait IntoIterator
where
<Self::IntoIter as Iterator>::Item == Self::Item
{
type Item;
type IntoIter: Iterator;
fn into_iter(self) -> Self::IntoIter;
}
There are some interesting constraints at work here, especially in
IntoIterator. To see why they’re necessary, let’s imagine implementing these
traits for one of Rust’s built-in data structures, Vec. These implementations
aren’t how Rust actually does them, but they’re enough to illustrate how these
traits use the type system.
First, let’s invent an iterator for vectors. This will use a struct called
VecIter which holds an owned Vec, and implements Iterator by shifting
items off the front of the vector until it’s empty.
struct VecIter<T> {
vec: Vec<T>,
}
impl<T> Iterator for VecIter<T> {
type Item = T;
fn next(&mut self) -> Option<Self::Item> {
if self.vec.is_empty() {
None
} else {
Some(self.vec.remove(0))
}
}
}
Iterator has an associated type Item, rather than being generic. That means
that, unlike Into<T>, we can only implement Iterator once for each type. The
Item type states what type of values this iterator produces: a Vec<u8> will
produce u8 values, a Vec<String> will produce strings, and in general a
Vec<T> will yield values of type T. Vec::<T>::remove returns a T,
so returning Some(self.vec.remove(0)) means we’re returning an Option<T>,
which satisfies the Iterator requirement of returning Option<Self::Item>.
Self::Item refers to the Item type in the current impl block, or to give
it its fully-qualified name, <VecIter<T> as Iterator>::Item.
Implementing Iterator on VecIter means that it implicitly implements
IntoIterator, and can therefore be passed to a for-loop. To use a Vec in
the same way, we need to implement IntoIterator in such a way as to convert a
Vec into a VecIter. Like Iterator, IntoIterator can only be implemented
once per type, and it has associated types to tell the compiler what specific
types this conversion produces: IntoIter says which concrete type is returned
from into_iter(), and this type must implement Iterator. Item says what
kind of values this iterator produces.
impl<T> IntoIterator for Vec<T> {
type Item = T;
type IntoIter = VecIter<T>;
fn into_iter(self) -> Self::IntoIter {
VecIter { vec: self }
}
}
Recall that the IntoIterator trait has the following constraint on its
implementation:
<Self::IntoIter as Iterator>::Item == Self::Item
Does our implementation for Vec<T> satisfy this? Well, Self is the type
we’re currently implementing: Vec<T> as IntoIterator. That gives us:
Self::IntoIter = VecIter<T>
Self::Item = T
Plugging these into the constraint gives:
<VecIter<T> as Iterator>::Item == T
This means the Item type from the impl Iterator for VecIter<T> block must
equal T, which indeed it does. But why do we need this constraint? It seems to
say something quite boring: when you implement IntoIterator, the iterator you
return should produce items of the type you’ve declared for
IntoIterator::Item. But why declare this? Isn’t it enough to say that this
implementation returns a VecIter<T>? IntoIterator doesn’t provide any of its
own functions so there’s nowhere that it refers to Self::Item in its
interface.
The answer is that it lets us put a constraint on FromIterator implementations
to make sure they’re valid. Consider the following implementation, which
satisfies the trait definition but doesn’t do anything but return an empty
vector. The <A> after impl means this implements FromIterator<A> for any
type A, rather than for some specific type like when we implemented
From<Person>.
impl<A> FromIterator<A> for Vec<A> {
fn from_iter<T>(iter: T) -> Self {
let vector = Vec::new();
vector
}
}
This allows the iter input to have absolutely any type T. Its type doesn’t
matter, because iter is not actually used in the function body. If we try to
iterate over it, we get an error because T does not implement Iterator:
impl<A> FromIterator<A> for Vec<A> {
fn from_iter<T>(iter: T) -> Self {
let vector = Vec::new();
for item in iter {}
vector
}
}
/* compile error:
error[E0277]: `T` is not an iterator
|
| for item in iter {}
| ^^^^ `T` is not an iterator
|
= help: the trait `std::iter::Iterator` is not implemented for `T`
= help: consider adding a `where T: std::iter::Iterator` bound
= note: required by `std::iter::IntoIterator::into_iter`
*/
To be allowed to iterate over T, it must implement IntoIterator:
impl<A> FromIterator<A> for Vec<A> {
fn from_iter<T>(iter: T) -> Self
where
T: IntoIterator
{
let vector = Vec::new();
for item in iter {}
vector
}
}
Declaring that T: IntoIterator means we can iterate over it, but we can’t do
anything interesting with its contents. To produce a vector from the input, we
probably want to add each item to the vector before returning it:
impl<A> FromIterator<A> for Vec<A> {
fn from_iter<T>(iter: T) -> Self
where
T: IntoIterator
{
let mut vector = Vec::new();
for item in iter {
vector.push(item);
}
vector
}
}
/* compile error:
error[E0308]: mismatched types
|
| vector
| ^^^^^^ expected type parameter, found associated type
|
= note: expected type `std::vec::Vec<A>`
found type `std::vec::Vec<<T as std::iter::IntoIterator>::Item>`
*/
Now we have a new problem. We’ve stated that from_iter() returns Self, or
Vec<A>, so its last expression must have type Vec<A>. But the inferred type
of vector is Vec<<T as IntoIterator>::Item>: because we’re pushing the
contents of iter into it, the type of the values is the Item type from the
implementation of IntoIterator for T. There’s no proof that this matches the
type A in the expected return type Vec<A>, so we get a compile error.
All it takes to resolve this is to declare that these types are in fact equal,
by constraining T to implement IntoIterator<Item = A>:
impl<A> FromIterator<A> for Vec<A> {
fn from_iter<T>(iter: T) -> Self
where
T: IntoIterator<Item = A>
{
let mut vector = Vec::new();
for item in iter {
vector.push(item);
}
vector
}
}
This will now compile, as Rust knows that the iterator that T is converted
into will produce A values. The IntoIterator::Item type lets us state this
constraint, and the constraint on IntoIterator makes sure this Item type is
the same as that in the resulting Iterator. Without this constraint, we’d have
to write something like this for the type of from_iter():
fn from_iter<T, U>(iter: T) -> Self
where
T: IntoIterator<IntoIter = U>,
<U as Iterator>::Item == A
So having the Item type in IntoIterator and constraining it to match Item
in the Iterator returned by into_iter() means that other functions can rely
on IntoIterator<Item = A> rather than the verbose declaration above. Of
course, this could be made simpler by having FromIterator depend directly on
Iterator, but depending on IntoIterator allows from_iter() to accept a
wider range of values without all its callers having to call into_iter() on
the input themselves.
We now understand what the generic trait FromIterator means:
trait FromIterator<A> {
fn from_iter<T>(iter: T) -> Self
where
T: IntoIterator<Item = A>;
}
Implementing FromIterator<A> means that a type can be generated from an
iterator of A, or anything that implements IntoIterator<Item = A>, and can
therefore be created by calling collect() on such an iterator.
Iterator::collect relies on FromIterator<Self::Item>, so it can return any
type that implements either a generic FromIterator<A>, or a specific
FromIterator for the kind of item that iterator holds. For example, vectors
can hold any kind of value, so Vec<T> implements FromIterator<T> with no
further constraints, but hashmaps can only hold pairs of type (K, V) where K
can be hashed and compared for equality, and so it implements
FromIterator as:
impl<K, V> FromIterator<(K, V)> for HashMap<K, V>
where
K: Eq + Hash
Therefore, we can turn any iterator into a Vec, but only some kinds of
iterators into a HashMap. Recall our original example for hashmaps:
let words = ["losing", "my", "edge"];
let lengths = words.iter().map(|word| (word, word.len()));
let map: HashMap<_, _> = lengths.collect();
This compiles and runs as we saw, but attempting to turn an array of strings into a hashmap is a compile-time error:
let words = ["losing", "my", "edge"];
let map: HashMap<_, _> = words.iter().collect();
/* compile error:
error[E0277]: a collection of type `std::collections::HashMap<_, _>` cannot be
built from an iterator over elements of type `&&str`
|
| let map: HashMap<_, _> = words.iter().collect();
| ^^^^^^^
| a collection of type `std::collections::HashMap<_, _>`
| cannot be built from `std::iter::Iterator<Item=&&str>`
|
= help: the trait `std::iter::FromIterator<&&str>` is
not implemented for `std::collections::HashMap<_, _>`
*/
HashMap<K, V> only implements FromIterator<(K, V)>, not the catch-all
FromIterator<A>, and the type &&str is not a variation on the pair type (K,
V). The compiler can’t find any application conversion, and so the program does
not compile.
Some final remarks. FromIterator is generic, meaning a single type can
implement it multiple times as we did for From – we could implement
FromIterator<u8> and FromIterator<String> for the same type. IntoIterator
is not generic, and can only be implemented once. So, each type has one way to
convert itself into an iterator, but can have multiple ways to generate itself
from an iterator.
Also, although FromIterator relies on IntoIterator, they are not equivalent
in the same way that From and Into are. Defining From<U> for T
implicitly defines Into<T> for U, since both these convert a U into a T.
But converting something into an iterator, and converting iterators into other
types, are not equivalent, so implementing IntoIterator doesn’t give you
FromIterator for free; you must implement both.
We have seen that Rust’s trait system allows for generic-returning functions
like collect() to have fine-grained constraints on the conversions they are
allowed to perform, while retaining a great deal of expressiveness as long as
the type system can identify a unique implementation to invoke. They let us
place the complexity in the definitions of functions, rather than at their call
sites, so that changing a function or trait definition does not necessarily mean
having to update loads of calls to that function, side-stepping some of the
maintenance headaches usually associated with typed languages.
