Why recursion matters, part 1: proof by induction

In Douglas Hofstadter’s classic book Gödel, Escher, Bach, the philosopher Zeno introduces his famous paradox by saying:

Not the wind, not the flag—neither one is moving, nor is anything moving at all. For I have discovered a great Theorem, which states: “Motion is Inherently Impossible.”

The paradox argues that, because any motion is composed of an infinite series of ever-smaller changes, it must therefore take an infinite amount of time, and so no motion can ever be completed. We know this to be false, but his idea of turning a process into an infinitely recursive structure turns out to be a very powerful idea and underlies many results in maths and computer science.

Once upon a time, the addition of recursion to programming languages was a controversial topic, these days it is so commonplace that programmers take it for granted. For example, if we have to find information in an HTML document, we can write a recursive function. It checks to see if the current node matches our requirements, and then repeats the process on all the node’s children.

function getElementsByTagName(name, root) {
  root = root || document.body

  let elements = [],
      children = root.childNodes

  if (root.tagName === name)

  for (let i = 0, n = children.length; i < n; i++)
    elements = elements.concat(
      getElementsByTagName(name, children[i])

  return elements

getElementsByTagName is a recursive function, defined in terms of itself. When applied to a tree, it will eventually reach the leaves of the structure and stop recursing. If we need to scrape information from a whole website, or even all the websites in the world, we can request a page, parse it, find all the links in it, and then repeat the process on all the URLs we find.

async function spider(pageUrl) {
  let html = await fetch(pageUrl).then((resp) => resp.text())

  jsdom.env(html, (error, window) => {
    let body  = window.document.body,
        links = getElementsByTagName('A', body)

    let urls = links.map((link) => url.resolve(pageUrl, link.href))

    urls.forEach((url) => spider(url))

Frequently we talk about recursive procedures: functions that execute a sequence of instructions, where some of those instructions are calls to the function itself. But we don’t talk so much about recursive structures. Sure, we create them all the time, when we render HTML or build a website, but they just sort of exist. They don’t do anything by themselves, the functions are the important thing!

I’d like to show you a few things recursive structures can do, some things that are useful, interesting, and in some cases surprising. We don’t have to look too far from these first examples to find a recursive structure: the JavaScript object system uses one.

When we make a new object, it has a hidden field called __proto__ that points to the object it inherits from. We can form chains of these inheritance relationships.

let person = { name: 'james' }

let p1 = Object.create(person)
// p1.__proto__ == person

let p2 = Object.create(p1)
// p2.__proto__ == p1

When we access a property or call a method in JavaScript, it checks the receiving object for that property. If the property does not exist in that object, it looks in its __proto__ object, and so on until it finds an object with that property.

p2.name // -> p2.hasOwnProperty('name') -> false
        //    p2.__proto__.name

        //    p1 = p2.__proto__
        //    p1.hasOwnProperty('name') -> false
        //    p1.__proto__.name

        //    person = p1.__proto__
        //    person.hasOwnProperty('name') -> true

        //    person.name
        //    -> 'james'

This recursive structure serves a single purpose: it describes how objects inherit from one another. And since it exists solely for that purpose, we could say this structure is how objects inherit from each other. Sure, there are functions in the JavaScript runtime that walks this structure, but we don’t need to see those functions to understand how the relationship operates.

I’m going to talk about three things in this series, three things that are made possible by recursive structures. Those things are, in no particular order: correctness, performance, and meaning. Let’s begin by talking about correctness.

In mathematics, there is a technique called proof by induction. This technique involves a recursive argument: we assume that a statement is true in some specific case in order to prove it is true in general. This sounds like begging the question, how could it possibly used in logical proofs? Well, let’s look at an example: the sum of the integers from 1 to n.

sum(1 .. 1) =  1

sum(1 .. 2) =  3

sum(1 .. 3) =  6

sum(1 .. 4) = 10

The sum of the integers from 1 to 1 is 1. The sum from 1 to 2 is 3. The sum from 1 to 3 is 6. The sum from 1 to to 4 is 10. And if we continue this sequence we may eventually realise these numbers seem to follow a pattern, namely: the sum from 1 to n equals n times n+1 all over 2.

                  n (n + 1)
sum(1 .. n)   =   ---------

We can make an intuitive argument for this being true. Think of the numbers from 1 to 10. If we pair 1 up with 10, and 2 with 9, and 3 with 8, and so on, then we end up with 5 pairs of numbers that add up to 11. That is, n/2 pairs adding up to n+1.

                1    2     3     4     5
               10    9     8     7     6
        sum:   11   11    11    11    11

But lists of concrete examples, no matter how many we have, are not proof. As programmers, we can write any number of unit tests for a function and still not know for certain that it works in general. So how do we prove it? We know, by definition, that the sum from 1 to n+1 is the sum from 1 to n, plus n+1.

sum(1 .. n+1) =   sum(1 .. n) + (n + 1)

This is recursive definition of sum that means that if we know the sum from 1 to n, we can work out the sum to n+1. We’d like to prove our formula above that gives the result purely in terms of n so that we can work out any sum without computing all the previous ones.

Proof by induction works as follows. First, we assume our formula for sum is correct, and substitute it it into the recursive definition, i.e. we replace sum(1 .. n) with our formula, n (n + 1) / 2, giving us the following:

                  n (n + 1)
sum(1 .. n+1) =   --------- + (n + 1)

If we multiply out the numerator in the first term, we get n^2 + n:

                  n^2 + n
sum(1 .. n+1) =   ------- + (n + 1)

We rewrite the second term as a fraction: 2n +2 over 2.

                  n^2 + n   2n + 2
sum(1 .. n+1) =   ------- + ------
                     2         2

And then we can add the two fractions together, giving n^2 + 3n + 1 all over 2.

                  n^2 + 3n + 2
sum(1 .. n+1) =   ------------

This can be factored as n + 1 times n + 2 over 2:

                  (n + 1) (n + 2)
sum(1 .. n+1) =   ---------------

Now, n + 2 is the same as (n + 1) + 1, so we can rewrite this as:

                  (n + 1) ((n + 1) + 1)
sum(1 .. n+1) =   ---------------------

Now, we have an expression that looks exactly like our original formula, with n replaced by n + 1.

                  n (n + 1)                         (n + 1) ((n + 1) + 1)
sum(1 .. n)   =   ---------   =>  sum(1 .. n+1) =   ---------------------
                      2                                       2

We have shown that if this formula in n is true, then this one in n+1 is true. If we can find some value of n where the formula is correct, then by a recursive argument, we know it’s true for all successive values of n. And indeed, we saw several such examples earlier, so we know this is true for all n from 1 upwards.

Now there is a related technique that allows us to do exactly the same thing with data structures, and it’s called structural induction. In functional programs, rather than having arrays that are flat sequences of values, we have lists, which are defined recursively. A list is either the empty list, or a value paired with a list. In Haskell syntax this might be written as “a list of t ([t]) is either the empty list ([]), or a value of type t paired with a list of t (t : [t])”.

data [t] = [] | t : [t]

So, our flat arrays are replaced with recursive pairs, and those pairs form a tree, with the empty list as the terminating pair.

[1, 2, 3, 4]  ==  (1 : (2 : (3 : (4 : []))))

Or, to represent this graphically, the list is a lop-sided tree whose left-branches are the values and whose right-branches point to the rest of the list.

                 / \
                1   .
                   / \
                  2   .
                     / \
                    3   .
                       / \
                      4   []

In this world, we define functions on lists recursively. For example, the length function takes a list [t] and returns an Int. The length of the empty list is zero. The length of of any other list (the pattern x:xs destructures the list into its first element and all the rest) is one plus the length of the rest of the list.

length        :: [t] -> Int
length []     =  0
length (x:xs) =  1 + length xs

map takes a function of type a to b (the a -> b type), and a list of a, and returns a list of b. Mapping a function over the empty list gives the empty list, and mapping a function over any other list means applying f to the first element and pairing that with the map over the rest.

map           :: (a -> b) -> [a] -> [b]
map f []      =  []
map f (x:xs)  =  f x : map f xs

(A note on Haskell notation: function calls do not need parens and commas like in other languages, so the expression f x : map f xs roughly means f(x) : map(f, xs) in more conventional syntax.)

In Haskell, functions are not lists of instructions that define processes. Functions are not executed from top to bottom completely before returning a value. Instead, Haskell functions define equations: they simply declare that two expressions are equal. They say, anywhere we see the expression on the left, we can replace it with the one on the right. They are rewrite rules, and these rewrite rules, just like the rules of algebra, allow us to prove things about these functions.

Let’s look at an example. We’ll prove something called the functor composition law. Haskell has a function called . (a single dot), which composes two functions. It takes a function b -> c, and a function a -> b, and returns a function a -> c. f . g is the composition of functions f and g, which if applied to some value h gives the same result as f (g h). result of applying g to h.

(.)       :: (b -> c) -> (a -> b) -> a -> c
(f . g) x =  f (g x)

The functor composition law states that mapping function g over a list, and then mapping f over the result, is the same as mapping f . g over the original list. In JavaScript syntax that would mean list.map(g).map(f) gives the same answer as list.map(compose(f, g)).

map f (map g list)  ==  map (f . g) list

How do we prove that? When we proved sum, we began by stating the function for n+1. When dealing with lists, we state the definition for the list plus one more element, or x : list in Haskell notation.

    map f (map g (x : list))

Now we attempt to rewrite this until it’s in a form we want. The definition of map says that map f (x:xs) = f x : map f xs, and so we can rewrite the inner term as g x : map g list:

    map f (g x : map g list)

The outer term is now in the form map f (x:xs) and we can rewrite it:

    f (g x) : map f (map g list)

Now comes the inductive step: we assume that map f (map g list) == map (f . g) list, so the above equals:

    f (g x) : map (f . g) list

The first term f (g x) is just the definition of (f . g) x so we can again rewrite that term:

    (f . g) x : map (f . g) list

Now by the definition of map, this is the same as map (f . g) (x : list), and so be assuming the functor composition law is true for list, we have proven it for list with one more element:

    map f (map g (x : list)) == map (f . g) (x : list)

We just need a find a base case where the law is true, and then we know it’s true for all lists longer than that. And indeed, the empty list works: map f [] = [] by definition of map, for any f, so the composition law is trivially true in that case.

These proofs are only possible because of our recursive definition, and indeed this technique underlies most if not all systems of proof for program correctness, including type systems. In a procedural language, where these functions would be written using for-loops, we cannot do this. A for-loop is not a structure, it’s a procedure, it has state, we cannot say that it is equal to anything or give it a meaningful type. Whereas, just by shunting symbols around mechanically, a compiler for a functional language can prove things about it, because its functions are defined as recursive equations between structures.

I want to point out two aspects of these proofs. First, notice how when we replace a function with its definition, we just expand it one step, rather than evaluating it for the whole input list. This partial evaluation is important for performance, which we’ll discuss next.

Second, notice how sometimes we move in reverse, from the definition of a function back to its ‘unevaluated’ form, as in the final steps in the above proof. We’ll explore this notion of moving backwards in the final part of this series.