Inheritance, revisited – The If Works

Late last year, I wrote a piece titled “Where’s my inheritance”, in which I argued against the inheritance implementation of various JavaScript libraries. I’ve recently been working on a rewrite of JS.Class that is much more Ruby-like, and it’s caused me to re-examine my thoughts on this issue.

With JS.Class 1.x, I made the conscious decision not to allow super calls to mixins. The next version will support this feature in a fully Ruby-compliant way, and I’d like to explain how that came about. To kick us off, here’s an explanation of Ruby’s method lookup algorithm (taken from The Ruby Programming Language by Dave Flanagan and Matz):

When Ruby evaluates a method invocation expression, it must first figure out which method is to be invoked. The process for doing this is called method lookup or method name resolution. For the method invocation expression o.m, Ruby performs name resolution with the following steps:

First, it checks the eigenclass of o for singleton methods named m.

If no method m is found in the eigenclass, Ruby searches the class of o for an instance method named m.

If no method m is found in the class, Ruby searches the instance methods of any modules included by the class of o. If that class includes more than one module, then they are searched in the reverse of the order in which they were included. That is, the most recently included module is searched first.

If no instance method m is found in the class of o or in its modules, then the search moves up the inheritance hierarchy to the superclass. Steps 2 and 3 are repeated for each class in the inheritance hierarchy until each ancestor class and its included modules have been searched.

If no method named m is found after completing the search, then a method named method_missing is invoked instead. In order to find an appropriate definition of this method, the name resolution algorithm starts over at step 1. The Kernel module provides a default implementation of method_missing, so this second pass of name resolution is guaranteed to succeed.

When a method o.m calls super, all this does is allow the method lookup process to carry on from wherever it left off while trying to find o.m. If it finds another suitable method, it calls it, and if that method calls super the search continues still deeper.

(Before I continue: in JavaScript, named method lookup is handled for you by the runtime. Inheritance library authors need to implement the behaviour of super() lookup since JavaScript does not natively support this.)

Now this explanation is more complicated than it needs to be since, as I’ve found out lately, classes are modules (Class inherits from Module) and class inheritance can be implemented as a special case of module inclusion. As all objects have an eigenclass, the process can simply be reduced to a reverse-order depth-first search of all the modules included in an object’s eigenclass. The distinction between classes and modules is mostly a question of semantics, of clarifying the design of your programs.

In JS.Class 1.x, JS.Module is a hack, and a bad one at that. All it does is protect a bunch of methods using some trickery with closures and hooks:

JS.Module = function(source) {
    return {
        included: function(klass) { klass.include(source); },
        extended: function(klass) { klass.extend(source); }
    };
};

Turns out that that’s not all that useful, and doesn’t even work as expected under certain circumstances:

var ModA = new JS.Module({
    foo: function() {}
});

var ModB = new JS.Module({
    include: ModA,
    bar: function() {}
});

// Works -- SomeClass includes ModA and ModB
SomeClass.include(ModB);

// Fails -- OtherClass gets ModB but not ModA
OtherClass.extend(ModB);

So we see that Module is not a first-class citizen in JS.Class 1.x, it’s something bolted-on to a single-inheritance model. So while JS.Class supports mixins, they don’t become part of the inheritance tree and you cannot access methods from mixins using callSuper.

At the time, I justified this to myself by arguing that Inheritance.js’s multiple inheritance was confusing and therefore broken. Said library implements super() calls to mixins by detecting when a method is being overwritten, so when you add a method foo() to object thingy, a super call inside foo() calls whatever the previous value of thingy.foo was. The effect of super is set when you define thingy.foo rather than when you call it. This means you can include the same method from multiple sources and each implementation will be able to call the previous implementation.

Turns out this actually works, in that it can simulate Ruby’s inheritance mechanism as long as you define all your methods in the right order. The inheritance chain cannot be changed at runtime, leaving your system less dynamic, though more predictable in a certain sense. Because of this implementation, I’d got it into my head that I was against multiple inheritance, whereas I now know I should have said I preferred late binding. Sorry, late binding. Late binding (as opposed to static binding) means that the process of mapping a method call to a function implementation is performed at runtime when the method is called, rather than when the method (or its containing class) is initially defined. Inheritance.js goes for static binding, which in JavaScript is easier to implement for multiple inheritance and performs better as no runtime callstack lookups are needed.

But Ruby (and JavaScript) method calls are late-bound, and a system ought to be self-consistent or else it ends up confusing people with special cases. My major issue with static binding (aside from consistency) is that it makes it harder to debug at runtime. With Inheritance.js, the inheritance stack is not stored formally anywhere, and super references are buried inside closures in the library. In Ruby, you have well-defined predictable semantics for multiple inheritance that can be inspected at runtime so you can figure out which methods will be called. With a little hacking this is really simple:

class Object
  def eigenclass
    class << self; self; end
  end

  def ancestors
    [eigenclass] + eigenclass.ancestors
  end

  def lookup(name)
    ancestors.map { |m| m.instance_method(name) rescue nil }
  end
end

class Foo; include Enumerable; end
f = Foo.new

f.ancestors
#=> [#<Class:#<Foo:0xb7c39508>>, Foo,
     Enumerable, Object, Kernel]

f.lookup :map
#=> [#<UnboundMethod: Foo(Enumerable)#map>,
     #<UnboundMethod: Foo(Enumerable)#map>,
     #<UnboundMethod: Enumerable#map>, nil, nil]

class Foo
  def map; super; end
end

f.lookup :map
#=> [#<UnboundMethod: Foo#map>,
     #<UnboundMethod: Foo#map>,
     #<UnboundMethod: Enumerable#map>, nil, nil]

def f.map; super; end

f.lookup :map
#=> [#<UnboundMethod: #<Class:#<Foo:0xb7c39508>>#map>,
     #<UnboundMethod: Foo#map>,
     #<UnboundMethod: Enumerable#map>, nil, nil]

We create a class called Foo that includes Enumerable, and create an instance f. We can inspect the modules that make up f: its eigenclass, Foo, Enumerable, Object and Kernel. If we look up f.map, we see Enumerable#map all the way down, as that’s the only map implementation in f’s inheritance stack. If we add a map method to Foo, that shows up in the stack, and if we go deeper and add a singleton method on f that shows up too. These lookups show you the order in which Ruby will call the implementations of map owned by f. This is late binding in action: method implementations are looked up dynamically at runtime.

Being able to inspect your program’s structure at runtime is tremendously powerful, both for metaprogramming and for debugging purposes. If you combine static binding with no runtime inspection, you’re left with crawling the source code by hand in order to figure out call order, and this gets even worse in dynamic languages where you can generate methods at runtime rather than writing them out by hand. This was my central point last time I visited this issue. Late binding means that whatever methods you’re calling must still exist somewhere in the program at call time, and if you know the lookup semantics of your language you can get your program to do its own lookups without having to crawl the source yourself.

In my previous post, I mentioned that

If I’m trying to debug some JavaScript and see the word [this.callSuper], the very first thing I’m going to do is inspect this.klass.superclass at that point in the code.

So by my own criteria, multiple inheritance is okay as long as the rules are well-understood, we avoid the diamond problem, and we can inspect what our program is doing at runtime. I’ve also been bitten enough times by my own restrictions in JS.Class 1.x that I’ve come round to the fact that you should just give programmers more power and freedom and let them worry about design issues themselves. Ruby fulfills these criteria, and after delving deeper into its object model, seeing that classes and modules are deeply connected, I can’t see any reason to continue disallowing multiple inheritance. The next version of JS.Class is being implemented with Module right at its core (just like Ruby), with almost everything else implemented using Module’s features. It even has eigenclasses, and fully Ruby-compliant late-bound arguments-optional super() calls. If you want to get in on the action before it comes out for real, here’s my svn:

http://github.com/jcoglan/js.class/tree/modular/source/

I would really encourage you to look through the code for class.js if you want to see how inheritance can be handled in a Ruby-like system and how classes and modules are connected. It’s only a couple hundred lines of code and I’m guessing it’s more accessible to many Ruby hackers than Ruby’s C source.