The If Works This dirt was a building before

The potentially asynchronous loop

If you write a lot of asynchronous or event-driven code, you’re probably going to end up needing an asynchronous for loop. That is, a loop that runs each iteration sequentially but those iterations may contain non-blocking logic that must halt the loop until the async action resumes. In my case, I need the main loop of JS.Test, the testing tool to be bundled with JS.Class 3.0, to run each test in sequence but each test has to support a suspend/resume system for async tests.

I’ll use a slightly more contrived example here: fetching a series of responses over Ajax in sequence using jQuery. It should be apparent that this will not do the job:

listOfUrls.forEach(function(url) {
  $.get(url, function(response) {
    // handle response
  });
});

The requests are made in sequence, but they overlap because each async request does not block the loop. We want each iteration to hold up the loop until the request it opens has completed. To do this, I’m going to introduce a resume callback as an argument to the iterator; each iteration must call this to continue the loop.

listOfUrls.asyncEach(function(url, resume) {
  $.get(url, function(response) {
    // handle response
    resume();
  });
});

An initial stab at implementing asyncEach() might look like this. The method takes an iterator function, and creates an internal function that moves a counter forward one index. As long as we’ve not reached the end of the list, we call iterator with the current element and the internal function as the resume callback.

Array.prototype.asyncEach = function(iterator) {
  var list = this,
      n    = list.length,
      i    = -1;

  var resume = function() {
    i += 1;
    if (i === n) return;
    iterator(list[i], resume);
  };
  resume();
};

This works just fine as long as every iteration contains an async action. Async code allows us to empty the call stack and start again when the async logic resumes. In JS.Test, each test might contain async code, but probably won’t, at least for my uses. If too many tests in a row don’t contain any async code, we get a stack overflow because of resume calling itself indirectly without giving the stack a break.

So we need a looping construct for iterations that might contain async code, that must on all JavaScript platforms only some of which have async functions built-in. Initially we can get rid of the stack overflow problem by scheduling the next iteration using setTimeout() instead of calling directly and growing the stack:

Array.prototype.asyncEach = function(iterator) {
  var list = this,
      n    = list.length,
      i    = -1;

  var iterate = function() {
    i += 1;
    if (i === n) return;
    iterator(list[i], resume);
  };

  var resume = function() {
    setTimeout(iterate, 1);
  };
  resume();
};

We should now be able to handle a large list of tests because the resume callback uses setTimeout() to essentially clear the call stack between iterations. But we now have the problem that this won’t run on platforms without setTimeout(). How do we turn it into a simple loop on such platforms without blowing the stack?

The trick is to have resume() act as a scheduling device, but implement the scheduling differently. On non-async platforms, we can do this by keeping a count of iterations left to run: calling resume() adds to this count, while calling iterate() decrements it. We keep a single loop running that runs the iteration as long as there are calls remaining. The finished code looks like this:

Array.prototype.asyncEach = function(iterator) {
  var list    = this,
      n       = list.length,
      i       = -1,
      calls   = 0,
      looping = false;

  var iterate = function() {
    calls -= 1;
    i += 1;
    if (i === n) return;
    iterator(list[i], resume);
  };

  var loop = function() {
    if (looping) return;
    looping = true;
    while (calls > 0) iterate();
    looping = false;
  };

  var resume = function() {
    calls += 1;
    if (typeof setTimeout === 'undefined') loop();
    else setTimeout(iterate, 1);
  };
  resume();
};

Notice how the looping flag blocks any more that one loop running at a time. After the first call to loop(), the effect of each iteration calling the resume() function is simply to increment calls and keep the loop going. This setup lets us iterate over a list on non-async platforms, and allows async code to exist within iterations on compatible platforms, keeping all the complexity in one place. Anywhere we want to iterate, we just call

list.asyncEach(function(item, resume) {
  // handle item
  resume();
});

and asyncEach figures out the best way to handle the loop.

What I mean when I use the MIT license

The MIT license, in case you’re not familiar with it, is one of a family of software licenses recognised by the Open Source Initiative. It’s one of the shortest and most liberal, and reads as follows:

The MIT License

Copyright (c) 2010 James T. Suckerpunch

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

It essentially says that anyone can use and modify the software however they like, as long as they put my name somewhere, and as long as they understand I make no promises regarding the software’s quality.

Which you’d think would be pretty simple, and indeed that’s why I use it. It’s a recognisable way for me to say to other developers: Here. I made this. You might find it useful. I don’t care what you do with it. I won’t sue you, and you can’t sue me. I don’t even care if you “forget” to credit me all that much. I think I’ll be okay.

Except, it’s not that simple. Even the open-source world, which some would have you believe is peopled entirely by neckbearded hippies, is subject to the occasional lawyer-infused tarpit where some company decides it can make a bit of headway by essentially preventing other people from being creative. I’m not going to get into the ins and out of the case, partly because I’m not qualified, by mostly for a far better reason:

It’s very, very boring.

I get why the GPL exists, really I do. Especially in areas like government and scientific research, Free Software is a very important movement. I also get that we can’t all give our software away for free. What I’m mostly annoyed by is the fact that even in communities where open-source is being done properly, where people just want to share their work without getting the lawyers involved, the water is still muddied by paranoia over what all the various open-source licenses mean and whether they are compatible.

Last week, the author of the Charts and Graphs module for Drupal emailed me to ask if I could include some extra version info in the Bluff source code so they could sniff for whether the user had a version with a particular bug that needed working around. Why does it matter what the user has? Can’t you just ship Bluff with your module? No: Drupal is released under the GPL and won’t accept any non-GPL code. So even though I was distributing code under a more liberal license, other authors had to ask their users to install a component themselves and do crazy workarounds rather than bundling the software they wanted to use.

Sure, Drupal has to be paranoid because unfortunately they operate in an environment where they have to be scared of the lawyers. But this policy is directly in conflict with people like me who just want to give our work away, and it ended up hurting Drupal’s users as they had to supply a component that for legal reasons couldn’t ship with the product itself.

So, before anyone else emails to ask me for permission or to alert me that horror-of-horrors someone’s using my ideas without stamping my name somewhere virtually nobody will ever read it, I want to make this very clear:

I make stuff because I want to learn things. Sometimes I have a good idea, and I want to share it. I’m not a business, and I don’t care about money. (Even supposing any of my side-projects could make money.) I use an instantly recognisable license that closely approximates my motives. I don’t care what you do with my work, aside from being interested if it’s awesome. If I have time, I’ll help you out and fix bugs. I’m not making any promises, except that I won’t sue you.

But what I care about most is that I get to work on stuff that matters to me without getting embroiled in the big anti-creative sideshow that is The Software Licensing System. And call me a neckbearded hippie, but if you’ve ever made anything I expect you feel the same way.

Evented programming patterns: Testing event-driven apps

This post is part of a series on event-driven programming. The complete series is:

Thus far all the articles in this series have focused on methods for structuring applications to make them more modular and maintainable. They all help in their own way when correctly applied, but all of them leave one major area with something to be desired: scripting.

To be clear, it’s not that these techniques make code unscriptable. The problem is that even though the components may be well-designed and conceptually easy to plug together, writing quick scripts to manipulate an event-driven application can require a lot of boilerplate. One place this really shows up is in tests. Tests should, nay, must be easy to read. If they’re not, then their role as specifications and design/debugging tools is lost. And if tests are easy to write, you end up writing more tests, and you make better software. Wins all around.

But have you tried integration-testing a heavily event-driven app? I don’t care how fancy your test framework is, when every step in a test scenario needs some async work you’re going to end up with this:

ClientSpec = JS.Test.describe(Faye.Client, function() {
  before(function(resume) {
    var server = new Faye.NodeAdapter({mount: '/'})
    server.listen(8000)
    setTimeout(function() {
      resume(function() {
        var endpoint = 'http://0.0.0.0:8000'
        this.clientA = new Faye.Client(endpoint)
        this.clientB = new Faye.Client(endpoint)
      })
    }, 500)
  })

  it('sends a message from A to B', function(resume) {
    clientA.subscribe('/channel', function(message) {
      this.message = message
    }, this)
    setTimeout(function() {
      clientB.publish('/channel', {hello: 'world'})
      setTimeout(function() {
        resume(function() {
          assertEqual( {hello: 'world'}, message )
        })
      }, 250)
    }, 100)
  })
})

All this test does is make sure one client can send a message to another using a Faye messaging server. While working on Faye, I’ve ended up getting the tests into a state where that test reads like this:

Scenario.run('Two clients, single message send',
function() { with(this) {
  server( 8000 )
  client( 'A', ['/channel'] )
  client( 'B', [] )
  publish( 'B', '/channel', {hello: 'world'} )
  checkInbox( 'A', [{hello: 'world'}] )
  checkInbox( 'B', [] )
}})

This should be pretty self-explanatory: start a server on port 8000, make client A subscribe to /channel, make client B with no subscriptions, make B publish the message {hello: 'world'} to /channel, and make sure A and only A received the message.

Now the problem is, executing all those steps synchronously won’t work. There are network delays and other timeouts that need to happen between steps to make the test work. We need an abstraction that lets us write scripts at a very high level like this and hides all the inconsequential (and possibly volatile) glue code between the lines. This is pretty easy to do by breaking the script into two components, which here I’ll call a Scenario and a CommandQueue.

The job of the Scenario is to implement all the script steps involved in running the test. It should have a method for each type of command that accepts the right arguments, and also accepts a callback that it should call when the scenario is ready to continue running steps. For example, let’s get our Faye scenario class started:

Scenario = function() {
  this._clients = {};
  this._inboxes = {};
};

Scenario.prototype.server = function(port, resume) {
  this._port = port;
  var server = new Faye.NodeAdapter({mount: '/'});
  server.listen(port);
  setTimeout(resume, 500);
};

The server() command in our test just takes a port number, so our scenario method needs to take the port number and the resume callback function. It runs resume after a delay to let the server spin up so it’s ready to take requests before we continue our test.

Next up we need the client() method. This takes a name and a list of channels to subscribe to, and the callback as before.

Scenario.prototype.client = function(name, channels, resume) {
  var endpoint = 'http://0.0.0.0:' + this._port,
      client   = new Faye.Client(endpoint);

  channels.forEach(function(channel) {
    client.subscribe(channel, function(message) {
      this._inboxes[name].push(message);
    }, this);
  }, this);

  this._clients[name] = client;
  this._inboxes[name] = [];
  setTimeout(resume, 100);
};

A similar pattern here: we create a channel, make somewhere to store the messages it receives, set up subscriptions, then wait a little for the subscriptions to have time to register with the server. By now the publish() and checkInbox() methods should be fairly predictable:

Scenario.prototype.publish = function(name, channel, message, resume) {
  var client = this._clients[name];
  client.publish(channel, message);
  setTimeout(resume, 250);
};

Scenario.prototype.checkInbox = function(name, messages, resume) {
  assert.deepEqual(messages, this._inboxes[name]);
  resume();
};

Note how in the final method we still accept the resume callback even though the method is synchronous and calls the callback immediately. The job of this component is to provide a convention that in general, you should pass each method a callback and it decides when the program should continue. Here we’ve used timeouts, but it could be after an Ajax call, and animation, a user-triggered event, anything at all.

Now we’ve implemented all the steps, we need something that can run the test as we’ve written it, without callbacks. I’ll call this the CommandQueue: rather than executing commands immediately it stores them in a queue.

CommandQueue = function() {
  this._scenario = new Scenario();
  this._commands = [];
};

CommandQueue.prototype = {
  server: function(port) {
    this.enqueue(['server', port]);
  },
  client: function(name, channels) {
    this.enqueue(['client', name, channels]);
  },
  publish: function(name, channel, message) {
    this.enqueue(['publish', name, channel, message]);
  },
  checkInbox: function(name, messages) {
    this.enqueue(['checkInbox', name, messages]);
  },
};

This implements the API that we want to use in our test, but so far the script is inert: nothing actually gets run. We need a method to run the next command in the queue. This takes the next command off the queue, and adds a callback to the argument list that will run the next command once called. The methods in the Scenario will use that callback to resume the test when ready.

CommandQueue.prototype.runNext = function() {
  var command = this._commands.shift().slice(),
      method  = command.shift(),
      self    = this;

  var resume = function() { self.runNext() };
  command.push(resume);

  this._scenario[method].apply(this._scenario, command);
};

We also need the first command addition to trigger the execution of the queue, so we’ll implement enqueue() to deal with this. We start the execution with a timeout, since if we do it synchronously no more commands will have been added by the time the first command returns.

CommandQueue.prototype.enqueue = function(command) {
  this._commands.push(command);
  if (this._started) return;

  this._started = true;

  var self = this;
  setTimeout(function() { self.runNext() }, 100);
};

As the final piece of glue, we need the Scenario.run() function, which takes our test script and executes it using a CommandQueue to do the work.

Scenario.run = function(testName, block) {
  var commandQueue = new CommandQueue();
  block.call(commandQueue);
};

If you have a lot of test scenarios, you can use another command queue to help sequence them into a single test suite without too much bother. For a more complete example of some of these patterns, you can read through the Faye test suite, which also includes a variation on the above done in Ruby with EventMachine and Test::Unit.

Evented programming patterns: Asynchronous pipelines

This post is part of a series on event-driven programming. The complete series is:

In a previous article for this series, I covered the topic of asynchronous methods: methods or functions that “return” a value by passing it to a callback instead of using the return keyword. The problem with these methods is that they are not easily composable in the traditional sense: since they don’t have normal return values, expressions such as f(g(x)) don’t work when g returns a result asynchronously. It is possible to compose these functions but it takes a lot more work:

// Apply g to x, then apply f to the result,
// then use _that_ result for something else

g(x, function(gx) {
  f(gx, function(fgx) {
    // do something with "f(g(x))"
  });
});

Hardly the most pleasing thing to write or read. If we have an arbitrary list of functions to pass a value through, the problem becomes even harder. With synchronous functions this is straightforward:

passThroughFilters = function(filters, value) {
  for (var i = 0, n = filters.length; i < n; i++) {
    value = filters[i](value);
  }
  return value;
};

We just pass the initial value to the first function, and use the return value as the starting point for the next iteration.

You’re probably wondering what practical problem this relates to and hoping against hope that I won’t mention monads (don’t worry). I hinted at it with the name of the above function. Filtering systems pop up in web stacks all the time: Rails’ before_filter and Rack middleware being two obvious examples. In particular both of these have the property that any function in the filter chain can block the rest of the chain: a before_filter can return false to block access to the underlying controller action, and Rack middleware can decide whether it wants to respond to a request or delegate it down the stack.

But both these examples are synchronous and easily implemented using composition, but I’ve had a couple of problems recently that required asynchronous filters: each filter can hold the chain up indefinitely while some async action is run, and the filter may resume the chain at any time using a callback.

The main example I have is the Faye extension system, which allows the user to modify or replace messages as they pass in and out of the server. Each extension method accepts a message and must callback with a message once it’s done with its modifications and filters. I’m going to present a slightly modified and hopefully more generically useful API for this, or at least one that’s more in keeping with the style I’ve used in this series.

server.addExtension('incoming', function(message, callback) {
  someAsyncAuthCall(message, function(allowed) {
    if (allowed) callback(message);
    else callback(null);
  });
});

This simple extension does something asynchronous to figure out whether the client is allowed to send that message, and if it’s not then the message is replaced with null to stop it propagating any further. Any number of extensions can be added to the server and each message is piped through them before reaching the core server code. The server calls the extension internally like this:

Faye.Server.prototype.process = function(message, callback, scope) {
  // various setup steps...
  this.pipeThroughExtensions('incoming', message, function(message) {
    // handle message after extensions have run
  }, this);
};

This just passes the message through the incoming filters, then picks the message up at the other end of the pipeline using an inline callback. These two methods, addExtension() and pipeThroughExtensions() are provided in Faye by a mixin called Extensible. The addExtension() method simply needs to store the extension in a list:

Extensible.addExtension = function(type, handler, scope) {
  this._extensions = this._extensions || {};
  var list = this._extensions[type] = this._extensions[type] || [];
  list.push([handler, scope]);
};

pipeThroughExtensions() is a little more complex. Internally it uses a simple function whose job it is to process a single extension from a list. If we’ve reached the end of the chain, we can run the waiting callback. Otherwise, we call the next extension and pass it the pipe function so the extension can resume the chain when it’s done.

Extensible.pipeThroughExtensions = function(type, input, callback, scope) {
  if (!this._extensions) return callback.call(scope, input);

  if (!this._extensions.hasOwnProperty(type))
    return callback.call(scope, input);

  var list = this._extensions[type].slice();

  var pipe = function(data) {
    var extension = list.shift();
    if (!extension) return callback.call(scope, data);
    extension[0].call(extension[1], data, pipe);
  };
  pipe(input);
};

That call to extension[0].call(extension[1], data, pipe) calls the extension with the current value of the input, passing in the continuation function. When the extension calls its callback, we process the next extension in the list.

This isn’t a pattern I’ve used an awful lot, but it solved a problem on Songkick really nicely last week. We have various buttons on the site that let users start tracking things, for example to start following an artist or say they’re going to a concert. These buttons typically are forms that submit using Ajax. For some of these buttons we just added the ability to auto-publish the concert to your Facebook profile. To do this our server-side code needs a cookie to be set that gives us permission to post to Facebook on that user’s behalf. So before the form is submitted, we need this cookie to be set, which requires a lot of async calls the the Facebook JavaScript SDK to log the user in and ask for publishing permissions.

The initial approach was to try to intercept the DOM events bound to the form submission and insert more logic for certain types of tracking button, but using the above model it was much simpler (again, this has been simplified to illustrate a point):

Trackings = { /* ... */ };
$.extend(Trackings, Extensible);

$('form.tracking').bind('submit', function() {
  var form = $(this);
  Trackings.pipeThroughExtensions('beforesubmit', form, function() {
    form.ajaxSubmit();
  });
  return false;
});

Trackings.addExtension('beforesubmit', function(form, resume) {
  if (!form.hasClass('im-going')) return resume(form);
  FB.login(function(session) {
    // more login and permission logic to set the cookie
    resume(form);
  });
});

This keeps the DOM bindings simple without shoving a lot of business logic into the initial event handler, but allows quite powerful modifications. For example in some contexts you may decide you need to block a submission: just don’t call resume() and you’ll block the filter chain.

One caveat: this pattern needs a better name: Extensible is far too vague and non-descriptive. This is basically the Observable module with the modification that the event listeners are allowed to have side effects on the event publisher. In Faye, Extensible is used in classes that support the extension API, but I’d recommend choosing something more domain-specific depending on how you need to tweak it.

Evented programming patterns: Object lifecycle

This post is part of a series on event-driven programming. The complete series is:

Earlier in this series I covered a very common pattern in event-driven programming: the Observable object. This technique lets one object notify many others when interesting things happen. JavaScript developers will be very familiar with this: it’s the same pattern that underlies the DOM event model.

I while ago I rewrote the JS.Class package loader and noticed a variation of this pattern emerge, which I’m going to call the object lifecycle. The typical use case is when some part of your code needs to execute once, as soon as some condition becomes true. In the package loader, this looks something like:

thePackage.when('loaded', function() {
  // Run code that relies on thePackage
});

This says: if thePackage is already loaded, then run this callback immediately. Otherwise, wait until thePackage is loaded and then run the callback. The implication is that the package will become loaded, only once, at some point in its life, and as soon as that happens we want to be notified. (I tend to use when for one-shot lifecycle events, and on for multi-fire events.) The implementation is quite similar to the Observable pattern, so you might want to revisit that before reading on.

Obviously, our lifecycle object is going to need to store lists of callbacks, indexed by event name as before. But in this case, if we know the event has already been triggered on that object, we can run the callback immediately and forget about it. When we trigger events, we also want to remove all the old pending callbacks after running them, since they don’t need to be called again.

LifeCycle = {
  when: function(eventType, listener, scope) {
    this._firedEvents = this._firedEvents || {};
    if (this._firedEvents.hasOwnProperty(eventType))
      return listener.call(scope);

    this._listeners = this._listeners || {};
    var list = this._listeners[eventType] = this._listeners[eventType] || [];
    list.push([listener, scope]);
  },

  trigger: function(eventType) {
    this._firedEvents = this._firedEvents || {};

    if (this._firedEvents.hasOwnProperty(eventType)) return false;
    this._firedEvents[eventName] = true;

    if (!this._listeners) return true;
    var list = this._listeners[eventType];
    if (!list) return true;
    list.forEach(function(listener) {
      listener[0].apply(listener[1], args);
    });
    delete this._listeners[eventType];
    return true;
  }
};

Note how the trigger() method checks to see if the event has already been fired: we don’t want the same stage in the lifecycle to be triggered multiple times. It also removes the listeners from the object after calling them, and returns true or false to indicate whether the event fired. This makes it easy to tell whether some action that should only be done once has already happened; for example in my package system I do something like this:

JS.Package = new JS.Class({
  include: LifeCycle,

  // various methods

  load: function() {
    if (!this.trigger('request')) return;
    // perform download logic...
  }
});

This kills two birds with one stone: it lets other listeners know that the package has been requested and checks whether it’s already been requested, so we don’t try to download the package multiple times.

Naturally, one thing a package system has to deal with is dependencies. Dependencies are just prerequisites: you can’t load a package until all its dependencies are loaded. More precisely, a package is loaded once the browser has downloaded its source code, and it is complete once it is loaded and all its dependencies are complete. To fill in some more of the load() method, this is easily expressed as:

JS.Package.prototype.load = function() {
  if (!this.trigger('request')) return;

  when({complete: this._dependencies, load: [this]}, function() {
    this.trigger('complete');
  }, this);

  when({loaded: this._dependencies}, function() {
    loadFile(this._path, function() { this.trigger('load') }, this);
  }, this);
};

This reads quite naturally: if the package has already been requested, do nothing. When the dependencies are complete and this package is loaded, then this package is complete. When the dependencies are loaded, load this package and then trigger its load event. Note how the load event will trigger a complete event if there are no dependencies, and this will ripple down the tree and trigger dependent packages to load.

I’ve used when() above to express groups of prerequisites in a natural way, but we don’t have an implementation for that yet – we only have the when() method for individual objects. So let’s write one. This when() function will need to gather up the list of preconditions, keep a tally of how many have triggered, and when they’re all done we can fire our callback. The first step in the function converts preconditions, which maps event names to lists of objects, into a simple list of object-event pairs. That is it turns {complete: [foo, bar], load: [this]} into [[foo, 'complete'], [bar, 'complete'], [this, 'load']].

var when = function(preconditions, listener, scope) {
  var eventList = [];
  for (var eventType in preconditions) {
    for (var i = 0, n = preconditions[eventType].length; i < n; i++) {
      var object = preconditions[eventType][i];
      eventList.push([object, eventType]);
    }
  }

  var pending = eventList.length;
  if (pending === 0) return listener.call(scope);

  for (var i = 0, n = pending; i < n; i++) {
    eventList[i][0].when(eventList[i][1], function() {
      pending -= 1;
      if (pending === 0) listener.call(scope);
    });
  }
};

If there are no pending events, we can just call the listener immediately. Otherwise, we set up listeners for all the events, and when each one fires (and remember: some of them may have fired already) we count down how many events we’re waiting for. When this reaches zero, we can carry on with the work we wanted to do.

This pattern is essentially a cross between Observable and Deferrable: we’re deferring an action, but the deferred items – the events – aren’t complex enough to merit their own objects so the implementation is closer to an observable object. The technique lends itself really well to expressing prerequisites in a natural way, even if the work you’re doing is not asynchronous.

I’ll have a couple more articles on event-driven programming next week, and you can catch me speaking at the London Ajax User Group on August 10th where I’ll be talking about Faye, event-driven code and testing.

Why Bayeux still matters

This article was prompted by a tweet from Micheil Smith:

Why are people still using cometd when we’re seeing websockets come into most modern browsers?

To recap, CometD is the reference client/server implementation of the Bayeux protocol, which defines a messaging protocol for web clients to publish and subscribe to message channels. This lets browsers send messages to each other, and lets server-side apps push messages to the browser. It’s the protocol that Faye uses behind the scenes.

Bayeux was originally designed with the browser environment in mind, considering what tools are available there: XMLHttpRequest, JSON and JSON-P, limited connections per host. Most importantly, it was designed as a technique to hack bidirectional messaging onto a request-response network transport, i.e. HTTP.

But it also, for better or worse, defines a messaging protocol involving topic channels that is independent of the network transport being used – it could even work entirely for in-process communication without going over the wire. I don’t know the history behind this, but it turns out to solve a core problem with server-to-client messaging: adequately identifying which clients to send a message to at the network level is tricky, and topic channels provide a nice abstraction that’s separate from any networking concerns.

So why do we still need Bayeux? Well first, the most obvious and boring problem: WebSockets aren’t everywhere. Chrome and Safari have them, Firefox will have them in 4.0, Internet Explorer 9 probably won’t have them (last I heard, correct me in the comments). There are projects like Socket.IO and web-socket-js that provide cross-platform socket-style networking, so the low-level problem of bidirectional networking can be worked around. Would be nice if we didn’t need them, but that’s life.

But suppose we lived in a world sans Microsoft and we could go ahead and use WebSockets for everything. Well, then you still have a problem. Several in fact. Server-side support for the protocol is still lacking, depending on which platform you use. In Ruby, you can only really do it with the small set of async servers, and even then it has to be hacked in. Obviously Node.js fares far better, although protocol support for WebSockets is still not in core and has to be done as a library, and there are already enough different implementations to demonstrate that there are different approaches that make sense in different situations. Add to this the fact that the protocol is not yet stable and you really don’t want to expose your app to this mess.

So what happens when WebSockets are deployed in all browsers and the protocol is stable? Well you’ve still got the problem of network dropouts to handle. The server will time out connections, the client’s network connection will cut out, and you don’t want to deal with reconnection/retry problems inside your pub/sub app.

And finally, as I mentioned before, most web app code in a system like Rails is typically so far from the business end of the network connection that storing a pool of connected clients and identifying them in a useful way so you can direct messages appropriately is non-trivial to say the least. One human user of your website will use many network connections as they navigate your site, and keeping track of which messages should be sent to each client-side WebSocket becomes a really hairy problem.

So this is where Faye, CometD and their ilk really shine: they provide a very high-level messaging service that makes it dead easy for you web app to publish messages, to the right users, without caring about network transports, disconnections, connection identity and the like. WebSockets are a great piece of infrastructure, and they probably exist at just about the right level of abstraction for what they do. But personally they’re not necessarily something I want to speak to directly when writing application code.

Faye 0.5: WebSockets, protocol extensions API and CometD integration

It’s been a few months since the last major Faye update, and in the interim the new release ended up getting so much feature creep that I’ve decided to skip a version number. That’s how much awesome is in the new release! 0.2 versions worth! It’s now available through npm, as well as Rubygems:

# for Node
sudo npm install faye

# for Ruby
sudo gem install faye

Anyway, what’s new? First up, to satisfy the buzzword junkies (I mean, make the network layer more efficient) the Faye client and server now support WebSockets. This is totally transparent to the user; Faye will just use it instead of XMLHttpRequest or JSON-P if your browser supports it. We’re currently only supporting the draft-75 version of the WebSocket protocol, and that’s what’s in Chrome 5 and Safari 5. We’ll add draft-76 support as new browsers are released.

There is a major API addition in the form of an extension system. This essentially lets you add middleware-style filters that can modify incoming and outgoing messages both on the server and on the client. You can run arbitrary logic on every message passing through the system – including the /meta/* messages that the Bayeux protocol uses – in order to implement extensions to the protocol. Common uses for this include user authentication and message acknowledgement, but really the possibilities are wide open. This should satisfy those of you who’ve been forking away on GitHub adding extra functionality to the core components for the last few months.

Extensions are fully documented in the server-side documentation, and you’ll probably benefit from skimming the Bayeux protocol spec (it’s only small) if you want to write extensions.

The final major improvement is that Faye is now protocol-compatible with CometD, the reference implementation of Bayeux with a high-performance Java server. You can connect Faye clients to CometD servers and vice versa; this means you can use CometD as a drop-in replacement for Faye’s server if you find you need to scale out your Bayeux app. It even works with the new WebSocket support that just came out as part of CometD 2.0.

Apart from all that, the client now handles multiple subscriptions to the same channel and makes unsubscribing easy:

var subscription = client.subscribe('/foo', function() {
  // ...
});

// Later on...
subscription.cancel();

There’s a bunch more tweaks that make it all a little faster and more robust, in particular the clients are now much better at reconnecting and re-establishing subscriptions if there’s a network disconnection or the server goes down. This actually means you can take down one server and replace it with another implementation without restarting any of the clients.

There are some backward-incompatible changes forced by the introduction of WebSocket support. You now set up a Node server like this, without calling Faye during a request:

var http = require('http'),
    faye = require('faye');

var bayeux = new faye.NodeAdapter({
  mount:    '/faye',
  timeout:  45
});

// Handle non-Bayeux requests
var server = http.createServer(function(request, response) {
  response.writeHead(200, {'Content-Type': 'text/plain'});
  response.write('Hello, non-Bayeux request');
  response.end();
});

bayeux.attach(server);
server.listen(8000);

The attach() method places filters in front of your request handlers to intercept Bayeux calls, so make sure you call attach() after adding your own request handlers. We had to do this because HTTP and WebSocket traffic each go through different event handlers in Node, and making the user forward these calls to Faye would have introduced far too much boilerplate for my liking.

The Ruby version now only supports Thin as the frontend webserver. You should have been using this anyway, since it’s much better at highly concurrent traffic than almost anything else in the Ruby world, but just make sure you specify Thin when racking up your application:

rackup -s thin -E production config.ru

Faye makes use of some extensions to Thin from the Cramp project, so thanks to Pratik for open-sourcing that work.

I think that just about covers it for now. Grab a download, and let me know how you get on. I’ve already heard from a few folks doing cool things with it, and I’d love to hear more!

JS.Class 2.1.5 supports Node, Narwhal and more

While there’s much work going on towards what will probably be JS.Class 3.0, the 2.1.x series is benefiting from some of the goodness being added upstream. I’ve just pushed out a new release that gets the package manager and all the libraries to work under CommonJS, specifically targeting Node.js and Narwhal for now.

I’ve had to make one tiny API change to avoid conflicting with the CommonJS API, so require() is now JS.require() and works just like it did before. To get your packages to work under CommonJS platforms, you don’t need to mess around with the exports object, you just need to remember this one rule:

If you want JS.Packages to find your object, do not declare it with var.

I’ll elaborate on this in a future post, but for now just remember that JS.Packages can only work with globally accessible objects, and using the var keyword (even outside a function) under CommonJS only makes the variable visible in the current file. If you stick to this rule and don’t use the exports object, you’ll have code that JS.Packages can run in any environment.

So, to get started using JS.Class on Node, just do what you’ve always done:

JSCLASS_PATH = './path/to/js.class';
require(JSCLASS_PATH + '/loader');

JS.require('JS.SortedSet', function() {
    var set = new JS.SortedSet([3,8,5,9]);
    require('sys').puts(set.count());
});

Note that the require() function is now called JS.require() in order to avoid conflicts with the CommonJS API. I thought about renaming it since I don’t like the fact that putting in the JS namespace makes it look like it can only load parts of JS.Class, but I honestly couldn’t think of a better name for it. Just remember you can load any library you like with it.

As usual, you can download JS.Class from its website.

Compiling the V8 JavaScript runtime under 64-bit Ubuntu

File under “I’m writing this for the benefit of my future self, and may not work on your machine.” I recently upgraded my home machine to a 64-bit edition of Ubuntu 10.04 and had do to more than the usual dance to get Google’s blazing fast V8 JavaScript interpreter to compile. Here’s what I did.

First up, install the usual build tools you’d need to compile V8:

sudo aptitude install build-essential subversion scons

Then, as detailed on the Chromium bug tracker, install a bunch of support libraries:

sudo aptitude install ia32-libs lib32z1-dev lib32bz2-dev

You will also find that the build complains about the absence of something called lstdc++ unless you do the following (replace 6.0.13 with whatever the version installed on your machine is):

sudo ln -s /usr/lib32/libstdc++.so.6.0.13 /usr/lib32/libstdc++.so

Finally, you can check out V8:

cd /usr/src
sudo svn checkout http://v8.googlecode.com/svn/trunk/ v8
cd v8

You’ll then need to stop the build being quite so whiny by editing v8/SConstruct as follows. Find the part of the file that looks roughly like this:

V8_EXTRA_FLAGS = {
  'gcc': {
    'all': {
      'WARNINGFLAGS': ['-Wall',
                       '-Werror',
                       '-W',
                       '-Wno-unused-parameter',
                       '-Wnon-virtual-dtor']

and comment out the '-Werror' line by placing a # at the beginning of it. Now, you should be ready to build:

sudo scons sample=shell
sudo ln -s /usr/src/v8/shell /usr/bin/v8

You should now be able to use V8 to run any JavaScript file you like. Failing that, just go and install Node, it’s much easier.

Terminus: control your browser from the command line

I’ve been saying for a while that I want to use Faye for automating JavaScript and integration testing, especially now that it has server-side clients. Well I took the first step in that direction this afternoon by hacking together Terminus, a distributed JavaScript console. You just install and run like so:

$ sudo gem install terminus
$ terminus
Terminus running at http://0.0.0.0:7004
Press CTRL-C to exit
>> 

Visiting the aforementioned http://0.0.0.0:7004 will give you a bookmarklet that you can drag up to your bookmarks bar. Running the bookmarklet while Terminus is running will connect the current page to your Terminus session, letting you run JavaScript on that page from the command line. And not just that page: every page you’ve connected will execute every line of script you type in. You may very well become drunk with power.

Anyway, have a play around and see if it’s useful. It’s still alpha quality and needs a fair few things adding, like per-page return values, connection reporting etc. It also needs some polish before you can drop the client script into your own apps and use it to drive tests. One step at a time.

← Before