Deriving the Y combinator

Before I start in on this: be aware I’m mostly writing this to force myself to understand something by writing it down. If you get anything out of it, consider it a bonus. I will be deriving Y() in JavaScrpit, and giving a version in Ruby.

After stumbling on this article on Raganwald last year (thoroughly interesting site by the way), I was initially intrigued and then totally frustrated by not being able to understand what the hell this Y combinator thing was. It’s of limited practical use to me, but it bugs me when I can’t wrap my brain around something. This is an attempt to remedy said frustration. You may find this Wikipedia article and Richard Gabriel’s The Why of Y (PDF, examples in Scheme) useful as we go along. I’ve cobbled this together from other articles linked from Raganwald and found through Google.

I assume we’re all familiar with the idea of a function as something that takes some input value and returns some output value. Say, the function for squaring numbers:

  • f(x) = x²

Or in Javascript:

var square = function(x) {
  return x * x;
};

The fixed points of a function are any input values for which f(x) is equal to x. So, the fixed points of f(x) = x² are 0 and 1.

Now, in mathematics and in any language that supports first-class functions (that is, functions that can be passed around as data), we have things called higher-order functions. These are functions that take another function as input, or return a function as output, or both. The fixed point of a higher order function f is another function p such that f(p) = p. (It may be more helpful to think in terms of functions actually being executed. The previous statement is equivalent to the statement that f(p)(x) = p(x) for all values of x.) Y (the Y combinator) is a special function that returns the fixed points of higher-order functions, that is to say:

  • f(Y(f)) = Y(f)

Y is commonly use to allow anonymous recursion without assuming your host language supports it. Let’s say I want a function that calculates the factorial of a number:

var factorial = function(x) {
  return x == 0 ? 1 : x * factorial(x-1);
};

Before we go any further: JavaScript supports anonymous recursion. The correct form of the above function is:

var factorial = function(x) {
  return x == 0 ? 1 : x * arguments.callee(x-1);
};

arguments.callee gives all functions an internal reference to themselves. This article is about what to do if such a feature is not available in your language.

There’s a big problem with the above function (the one without arguments.callee): it relies on the name I’ve given the function externally. If I do this:

var bar = factorial;
factorial = "something";
bar(5)  // -> does not work

My factorial function ceases to work because, inside it, factorial is no longer what it expects. What we’re going to work towards here is a function that acts as a factory for factorial functions:

var f = function(q) {
  return function(x) {
    return x == 0 ? 1 : x * q(x-1);
  };
};

This function f will return a factorial function. To recurse, it calls a local function (only defined inside f, not globally) called q. We need to pass this function to f to make the recursion work correctly, but what function do we need? q needs to be the factorial function for the inner function to work properly, but we’re back where we started: we can’t write a secure factorial function because we can’t use anonymous recursion.

But, notice what’s happened: if we pass the factorial function to f, it will return the factorial function. The factorial function is thus a fixed point of f, and can be determined using the Y combinator. So how do we get there?

To start with, lets make the recursive function local to factorial by passing it in as a parameter:

var factorial = function(h, x) {
  return x == 0 ? 1 : x * h(h, x-1);
};

factorial(factorial, 5)  // -> 120

We now pass a function to tell factorial how to recurse without refering to any global variables. This interface is not really satisfactory though – I want to be able to call functionName(value). So let’s create a function that returns another function (this is called currying our factorial function):

var factorial = function(h) {
  return function(x) {
    return x == 0 ? 1 : x * h(h)(x-1);
  };
};

factorial(factorial)(5)  // -> 120

That’s all well and good, but there’s a lot of cruft and duplication and we’ve got h(h)(x) in the function definition rather than a single q(x) as above. So, we create another nested function that creates a single reference to h(h):

var factorial = function(h) {
  return function(x) {
    var f = function(q, x) {
      return x == 0 ? 1 : x * q(x-1);
    };
    return f(h(h), x);
  };
};

factorial(factorial)(5)  // -> 120

This is almost looking like our f function form before. We just curry the innermost function to complete the transformation:

var factorial = function(h) {
  return function(x) {
    var f = function(q) {
      return function(x) {
        return x == 0 ? 1 : x * q(x-1);
      };
    };
    return f(h(h))(x);
  };
};

factorial(factorial)(5)  // -> 120

Now that inner function f does not contain any references to variables outside it, so we can pull it out and it will work the same:

var f = function(q) {
  return function(x) {
    return x == 0 ? 1 : x * q(x-1);
  };
};

var factorial = function(h) {
  return function(x) {
    return f(h(h))(x);
  };
};

factorial(factorial)(5)  // -> 120

Now we have our original f that we want the fixed point of. The fixed point is the factorial function given by factorial(factorial), so we can write our Y() function to return it:

var Y = function(f) {
  var g = function(h) {
    return function(x) {
      return f(h(h))(x);
    };
  };
  return g(g);
};

This is more generally written as:

var Y = function(f) {
  return (function(g) {
    return g(g);
  })(function(h) {
    return function() {
      return f(h(h)).apply(null, arguments);
    };
  });
};

Y() will now return the fixed point of any higher-order function, which is just what we wanted:

var factorial = Y(function(recurse) {
  return function(x) {
    return x == 0 ? 1 : x * recurse(x-1);
  };
});

factorial(5)  // -> 120

And presto – anonymous recursion with no dependence on global variables or support from the host language. If you want to do this in Ruby (I’m not sure whether it has an analog for arguments.callee), it looks like this:

def y(&f)
  lambda { |g| g[g] } [
    lambda do |h|
      lambda { |*args| f[h[h]][*args] }
    end
  ]
end

factorial = y do |recurse|
  lambda do |x|
    x.zero? ? 1 : x * recurse[x-1]
  end
end

The frustrating thing about Y is that once you’ve derived it, it’s totally impossible to tell what it does just by looking at it. If you made it this far, congratulations.