The cost of privacy

I have a bone to pick with a certain oddly prevalent piece of received wisdom in the JavaScript community. I’ve been meaning to rant about this properly for what seems like geologic amounts of time, but I finally hit a concrete example today that both broke the camel’s back and gave me something I could actually illustrate my point with.

Let’s talk about private variables, or more precisely, defining API methods inside a closure, giving them privileged access to data that code outside the closure cannot see. This is used in several JavaScript design patterns, for example when writing constructors or using the module pattern:

// Defining a constructor
var Foo = function() {
  var privateData = {hello: 'world', etc: 'etc'};

  this.publicMethod = function(key) {
    return privateData[key];
  };
};

// Defining a single object
var Bar = (function() {
  var privateData = {hello: 'world', etc: 'etc'};
  
  return {
    publicMethod: function(key) {
      return privateData[key];
    }
  };
})();

The reason people use this pattern, in fact the only reason for doing so as far as I can tell, is encapsulation. They want to keep the internal state of something private, so that access to it is predictable and it can’t put the object in a weird or inconsistent state, or leak implementation details to consumers of the API. These are all laudable design goals.

However, encapsulation does not need to be rigorously enforced by the machine, and using this style has all sorts of annoying costs that I’ll get to in just a second. Encapsulation is something you get by deliberately designing interfaces and architectures, by communicating with your team/users, and through documentation and tests. Trying to enforce it in code shows a level of paranoia that isn’t necessary in most situations, and this code style has plenty of costs that grossly offset the minimal encapsulation benefit it provides.

So, I guess I should show you what I’m talking about. Okay, hands up who can read the jQuery.Deferred documentation and then tell me what this code does?

var DelayedTask = function() {
  jQuery.Deferred.call(this);
};
DelayedTask.prototype = new jQuery.Deferred();

var task = new DelayedTask();
task.done(function(value) { console.log('Done!', value) });

task.resolve('the value');

var again = new DelayedTask();
again.done(function(value) { console.log('Again!', value) });

I asked this on Twitter earlier and got one correct response. This code prints the following:

Done! the value
Again! the value

But from the source code, what it seems to be trying to do is as follows:

  • Create a subclass of jQuery.Deferred
  • Instantiate a copy of the subclass, and add a callback to it
  • Resolve the instance, invoking the callback
  • Create a second instance of the subclass and add a callback to it
  • Do not resolve the second instance

But it does not do this: the second instance is somehow already resolved, and the callback we add is unexpectedly invoked. What’s going on?

Well, let’s examine what we expect to happen when we try to subclass in JavaScript:

var DelayedTask = function() {
  jQuery.Deferred.call(this);
};
DelayedTask.prototype = new jQuery.Deferred();

We assign the prototype of our subclass to an instance of the superclass, putting all its API methods in our subclass’s prototype. But this not only puts the superclass’s methods into our prototype, it also puts the object’s state in there, so in our constructor we call jQuery.Deferred.call(this). This applies the superclass’s constructor function to our new instance, setting up a fresh copy of any state the object might need.

So why doesn’t this work? Well it turns out that inside the jQuery.Deferred function you’ll find code that essentially does this:

jQuery.Deferred = function() {
  var doneCallbacks = [],
      failCallbacks = [],
      // more private state

  var done = function(callback) {
    doneCallbacks.push(callback);
  };

  var fail = function(callback) {
    failCallbacks.push(callback);
  };
  
  return {
    done: done,
    fail: fail,
    // more API methods
  };
};

So now we see what’s really going on: jQuery.Deferred, despite its capitalized name and the docs’ instruction to invoke it with new, is not a constructor. It’s a normal function that creates and returns an object, rather than acting on the object created by new. Its methods are not stored in a prototype, they are created anew every time you create a deferred object. As such they are bound to the private data inside the Deferred closure, and cannot be reused and applied to other objects, such as objects that try to inherit this API. It also means that calling jQuery.Deferred.call(this) in our constructor is pointless, since it just returns a new object and does not modify this at all.

This concept of binding is important. All JavaScript function invocations have an implicit argument: this. It refers to the receiving object if the function is called as a method (i.e. o in o.m()) or can be set explicitly using the first argument to call() or apply(). Being able to invoke a method against any object is part of what makes them useful; a single function can be used by all the objects in a class and give different output depending on each object’s state. A function that does not use this to refer to state, but instead refers to variables inside a closure, can only act on that data; if you want to reuse its behaviour you have to reimplement it or manually delegate to it.

Manual delegation means that in order to implement a ‘subtype’, we keep an instance of the supertype as an instance variable and reimplement its API, delegating calls to the stored object.

var DelayedTask = function() {
  this._deferred = new jQuery.Deferred();
};

DelayedTask.prototype.always = function() {
  this._deferred.always.apply(this._deferred, arguments);
  return this;
};

// 17 more method definitions

One correspondent on Twitter suggested I do this to dynamically inherit the jQuery API:

var DelayedTask = function() {
  jQuery.extend(this, new jQuery.Deferred());
};

This makes a new instance of the superclass, and copies its API onto the subclass instance. This avoids the problem of manual delegation, but you just introduced four new problems:

  • It assumes the methods are correctly bound to new jQuery.Deferred(), so they will work correctly if invoked as methods on another object. This happens to be true in our case but it’s a risky assumption to make about all objects.
  • These methods will return a reference to the jQuery.Deferred instance, rather than the DelayedTask object, breaking an abstraction boundary.
  • for/in loops are slow; object creation now takes O(N) time where N is the size of its API.
  • Doing this will clobber any methods you define in your prototype, so if you want to override any methods you have to put them in the constructor.

So we’ve shown that defining your methods in the constructor, rather than in the prototype, hurts reusability and increases maintenance effort, especially in such a dynamic, malleable language as JavaScript. It also makes code harder to understand: if I have to read the whole bloated body of a function to figure out that it’s not really a constructor, because it explicitly returns an object after defining a ton of methods, that’s a maintenance problem. Here’s several things I expect to be true if you name a function using an uppercase first letter:

  • It must be invoked using new to function correctly
  • The object returned by new Foo() gives true for object instanceof Foo
  • It can be subclassed easily using the technique shown above
  • Its public API can be seen by inspecting its prototype

One of JavaScript’s key problems is that it’s not semantically rich enough to accurately convey the author’s intent. This can largely be solved using libraries and consistent style, but any time you need to read to the end of a function (including all possible early exits) to figure out if it’s really a constructor or not, that causes maintenance problems.

I’ve shown that it causes reusability problems and shared state, and that it makes code harder to understand, but if I know JavaScript programmers you’re probably not convinced. Okay, let’s try science:

require('jsclass');
JS.require('JS.Benchmark');

var Foo = function() {};
Foo.prototype.method = function() {};

var Bar = function() {
  this.method = function() {};
};

JS.Benchmark.measure('prototype', 1000000, {
  test: function() { new Foo() }
});

JS.Benchmark.measure('constructor', 1000000, {
  test: function() { new Bar() }
});

We have two classes: one defines a method on its prototype, and the other defines it inside its constructor. Let’s see how they perform:

$ node test.js 
 BENCHMARK [47ms +/- 8%] prototype
 BENCHMARK [265ms +/- 26%] constructor

(Another pet peeve: benchmarks with no statistical error margins.)

Defining the API inside the constructor – just one no-op method – is substantially slower than if the method is defined on the prototype. Every time you invoke Bar, it has to construct its entire API from scratch, instead of simply setting a pointer to the prototype to inherit methods. It also uses more memory: all those function objects aren’t free, and are more likely to leak memory since they’re inside a closure. As the API gets bigger, this problem only gets worse.

Defining methods this way vastly increases the runtime of constructors, increases memory use, and forces manual delegation, adding an extra function dispatch to every method call in the subclass. JavaScript function calls are expensive: two of the best ways to improve the performance of JavaScript code are to inline function calls, and make constructors cheaper (a third being to aggressively remove loops). The original version of Sylvester defined methods in its constructors, and the first big performance win involved moving to prototypes. One of the factors that made faye-websocket much faster was removing unnecessary function calls and loops.

As a sweet bonus, if your instances all share the same copy of their methods, you can do useful things like test coverage analysis, which is impossible if every instance gets a fresh set of methods.

Yes, I know most of the time object creation and function calls do not dominate the runtime of an application, but when they do, you will wish you’d written your code in a style less likely to introduce performance problems. Writing code with prototypes is no more costly than using closures in terms of development effort, and avoids all the problems I’ve listed above, and I like sticking to habits that result in less maintenance work.

If you really want privacy, you need to ask yourself these questions. First, is your API actually guaranteeing privacy? It’s easy to let one little object reference slip out through an API and your whole privacy claim is blown. Second, is it worth all the above costs? And third, can you better communicate the design intent of your code without incurring these costs? For example, many people prefix ‘private’ fields with an underscore to signal you shouldn’t call them. I go one step further and compress my code using Packer, which obfuscates these underscored names. You can still reuse my methods because they’re not bound to private state, but it’s very clear which methods are public and which aren’t. I’m not going to stop you using them, but the risk is very clearly stated.

Finally consider the real reason we’ve been told global variables are evil, and we should encapsulate things are much as possible. Global variables are evil because they are an example of implicit shared state. This is definitely something to avoid, but you need to know it’s this you’re avoiding, and not global variables per se. The methods in the jQuery.Deferred API still have an implicit shared state problem that in some sense is worse than the global variable problem: the user cannot completely determine the function’s output from its inputs, because the user cannot change the object the function acts upon. The function’s behaviour is bound to state that the user cannot see or replace.

CommonJS doesn’t really solve this problem either, it just moves it to the filesystem so multiple versions of a library can co-exist and each module can get its own copies of its dependencies. (I’d argue this a waste of memory and start-up time for very little reward.) You still have a global shared namespace (both in the filesystem and the JavaScript runtime), and you can still change the public API of a CommonJS module, just as you can for anything defined using the module pattern. There’s only so far you can go in locking down your code, at some point some of it has to walk out into the mean wide world and interact with other programs. Deal with it, and quit punishing your users with bad design decisions.