websocket-driver: an I/O-agnostic WebSocket module, or, why most protocol libraries aren’t

A couple of days ago I pushed the latest release of faye-websocket for Node and Ruby. The only user-facing change in version 0.5 is that the library now better supports the I/O conventions of each platform; on Node this means WebSocket objects are now duplex streams so making an echo server is as simple as:

var http      = require('http'),
    WebSocket = require('faye-websocket');

var server = http.createServer();

server.on('upgrade', function(request, socket, body) {
  var ws = new WebSocket(request, socket, body);
  ws.pipe(ws);
});

server.listen(8000);

On Ruby, it means that Faye::WebSocket now supports the rack.hijack API for accessing the TCP socket, which means you can now use it to handle WebSockets in apps served by Puma, Rainbows 4.5, Phusion Passenger 4.0, and other servers.

But there’s a much bigger change behind the scenes, which is that faye-websocket is now powered by an I/O agnostic WebSocket protocol module called websocket-driver, available for Node and Ruby. The entire protocol is encapsulated in that module such that all the user needs to do is supply some means of doing I/O. faye-websocket is now just a thin module that hooks websocket-driver up to various I/O systems, such as Rack and Node web servers or TCP/TLS sockets on the client side.

I started work on this a few weeks ago when the authors of Celluloid and Puma asked me if faye-websocket could be used to add WebSocket support to those systems. I said it could probably already do this, since Poltergeist and Terminus have been using the protocol classes with Ruby’s TCPServer for a while without too much effort. So I began extracting these classes into their own library, and wrote the beginnings of some documentation for them.

But as I got into explaining how to use this new library, I noticed how hard it was to use correctly. Loads of protocol details were leaking out of these classes and would have to be reimplemented by users. For example, here’s a pseudocode-ish outline of how the client would have to process data it received over TCP. If it looks complicated, that’s because it is complicated, but I’ll explain it soon enough.

class Client
  def initialize(url)
    @uri       = URI.parse(url)
    @parser    = Faye::WebSocket::HybiParser.new(url, :masking => true)
    @state     = :connecting
    @tcp       = tcp_connect(@uri.host, @uri.port || 80)
    @handshake = @parser.create_handshake

    @tcp.write(@handshake.request_data)
    loop { parse(@tcp.read) }
  end

  def parse(data)
    case @state
    when :connecting
      leftovers = @handshake.parse(data)
      return unless @handshake.complete?
      if @handshake.valid?
        @state = :open
        parse(leftovers)
        @queue.each { |msg| send(msg) } if @queue
      else
        @state = :closed
      end
    when :open, :closing
      @parser.parse(data)
    end
  end

  def send(message)
    case @state
    when :connecting
      @queue ||= []
      @queue << message
    when :open
      data = @parser.frame(message, :text)
      @tcp.write(data)
    end
  end
end

But using websocket-driver the equivalent implementation would be:

class Client
  attr_reader :url

  def initialize(url)
    @url    = url
    @uri    = URI.parse(url)
    @driver = WebSocket::Driver.client(self)
    @tcp    = tcp_connect(@uri.host, @uri.port || 80)

    @driver.start
    loop { parse(@tcp.read) }
  end

  def parse(data)
    @driver.parse(data)
  end

  def send(message)
    @driver.text(message)
  end

  def write(data)
    @tcp.write(data)
  end
end

So before, the client had to implement code to create a handshake request, split the input stream on whether it was currently parsing the HTTP handshake headers or a WebSocket frame and switch state accordingly, remembering to parse the leftovers; it’s entirely possible you might receive the handshake headers and some WebSocket frame data in the same data chunk, and you can’t drop that frame data. Because of the design of the WebSocket wire format, dropping or misinterpreting even one byte of input changes the meaning of the rest of the stream, possibly leading to behaviour an attacker might exploit.

It also had to maintain state around sending messages out, since messages can only be sent after the handshake is complete. So if you tried to send a message while in the :connecting state, it would put the message in a queue and deliver it once the handshake was complete.

When we switch to websocket-driver, all those concerns go away. We treat the whole TCP input stream as one stream of data, because that’s what it is. We stream all incoming bytes to the driver and let it deal with managing state. It will emit events to tell us when interesting things happen, like the handshake completing or a message being received. When we want to send a message, we tell the driver to format it as a text frame. If the driver knows the handshake is not complete it will queue it and deliver it when it’s safe to do so. In the second example, we don’t even mention the concept of handshakes: the user doesn’t need to know anything about how the protocol works to use the driver correctly. The new Client class just hooks the driver up to a TCP socket and provides an API for sending messages.

The driver produces TCP output by calling the client’s write() method with the data we should send over the socket. When we call @driver.start, the driver calls client.write with a string containing handshake request headers. When we call @driver.text("Hello"), the driver will call client.write("\x81\x05Hello") (for unmasked frames), either immediately or after the handshake is complete.

This final point highlights the core problem with a lot of protocol libraries. By taking a strictly object-oriented approach where all protocol state is encapsulated and objects send commands to one another, we’ve allowed the protocol library to control when output happens, not just what output happens. A protocol is not just some functions for parsing and formatting between TCP streams and domain messages, it’s a sequence of actions that must be performed in a certain order by two or more parties in order to reach a goal. A protocol library, if it wishes to help users deploy the protocol correctly and safely, should drive the user’s code by telling it when to do certain things, not just give the user a bag of parsing ingredients and ask them to glue them together in the right order.

The fact that other protocol libraries have no means of telling the user when to send certain bits of output means that they end up leaking a lot of protocol details into the user’s code. For example, WebSocket has various control frames aside from those for text and binary messages. If you receive a ‘ping’ frame, you must respond by sending a ‘pong’ frame containing the same payload. If you receive a ‘close’ frame, you should respond in kind and then close the connection. If you receive malformed data you should send a ‘close’ frame with an error code and then close the connection. So there are various situations when the parser should react, not by yielding the data to the user, but by telling the user to send certain types of responses. But the most-downloaded Ruby library for this (websocket) handles the latter case by yielding the data to the user and expecting them to do the right thing.

I’ve tried reimplementing faye-websocket’s Client class on top of websocket and the amount of boilerplate required is huge if you want to produce a correct implementation. Here’s a laundry list of stuff you need to implement yourself (links are to relevant sections of code):

So this protocol library not only leaks by making the user track the state of the connection and the state of the parser, but also makes them implement stuff the protocol should deal with. Almost all the above points are behaviours set down in the specification; the user must implement them this way or their deployment is buggy. Since the user has no meaningful control over how this stuff works, all this code is just boilerplate that requires significant knowledge to write correctly. In contrast, faye-websocket and websocket-driver have never emitted events on ping frames because the user has no choice over how to handle them, so why make them write code for that? In websocket-driver, all the above points (and this list is not exhaustive) are dealt with by the protocol library and this gives users a much better hope of deploying WebSockets correctly and safely.

I’m not saying the websocket library is broken, per se. I’m saying it doesn’t go far enough. In Ruby we have lots of different means of doing network I/O, and there’s a few in Node if you consider HTTP servers and TCP/TLS sockets, though they all have similar interfaces. If you want to build a widely useful protocol library, you should solve as many problems as possible for the user so that they just need to bring some I/O and they’re pretty much done. Asking the user to rebuild half the protocol themselves is a recipe for bugs, security holes and wasted effort. We shouldn’t have to rebuild each protocol for every I/O stack we invent, so let’s stop.

Let the user tell you what they want to do, and then tell their code how and when to realize this over TCP. If you find yourself explaining the protocol mechanics when you’re documenting your library, it’s not simple enough yet. Refactor until I don’t need to read the RFC to deploy it properly.

I’m not helping

I just arrived home from the Realtime Conference in Lyon, France. It was a terrific event full of interesting people talking about a diverse range of topics in beautiful surroundings, and I’m hugely grateful to the organisers for inviting me. I could wax lyrical about some of the technical things I learned there but I’m going to focus on the topic of Adam Brault’s incredibly honest closing speech, and a topic I rarely discuss here: people.

RealtimeConf was just one in a series of events I’ve attended recently where I got to talk to people about their experience of community events and interpersonal relationships. I hope I can write something about community in general in the future, but this post is about me. I apologise in advance for the grotesque level of self-indulgence here but I need to put this on the record. I have not named anyone who informed this article through one-on-one conversation with me; I am tremendously thankful for their input and if you recognize yourself in this story and I have misrepresented you, please let me know and I will make corrections.

So. I don’t know of any way to say this without seeming boastful or pretentious, but: I have somehow become moderately well-known in the JavaScript and Ruby communities. What I mean by that is that it’s not infrequently that I meet people who ‘know me from the internet’, because of my open-source projects, my blog, conference talks, or my Twitter feed. Actually, particularly my Twitter feed. People tell me they like my rants, or ‘diatribes’ as one friend put it. And indeed I’ve met various great people, and through some slightly bizarre circumstances, made some in-real-life friends because people find my verbose streams of incoherence on matters technological entertaining.

But on the other hand, I’ve been called out by several people for being too negative, too angry, for swearing too much, for disparaging other’s work without constructive feedback, for posting too frequently, and generally making people feel bad. And before anyone feels obliged to leap to my defence or tell me you can’t please everyone: these are all valid criticisms. I’m a nasty person sometimes. People see different sides of you, and have differing opinions of those sides, but the aforelisted behaviours are matters of observable fact. I do those things.

Okay, James, enough with the navel-gazing. Why does any of this matter?

The thing about being well-known is it creeps up on you. It’s very hard to tell, when you walk into a conference hall full of strangers, how many people know who you are. It still comes as a surprise to me when people have heard of me or know my work, and I still feel as anonymous walking into a venue as I did at the very first developer conf I went to in 2007. But the illusion of anonymity soon fades when people come and introduce themselves.

I’ve met a lot of new people recently. RealtimeConf was my third conf in as many weeks, I’m going to another next week, and at the beginning of March I did /dev/fort. And several of those people have had a reaction somewhere in the region of: “wow, you’re a nice person in real life”. Now I won’t claim that’s always the case either, but it’s certainly disappointing to find out that people’s expectation, based solely on following me on Twitter, is that I’ll be a sarcastic, negative troll who’s no fun to be around.

While it sucks to find out people are apprehensive about meeting me, the straw that broke the camel’s back is something another one of the speakers told me yesterday: they were nervous about giving their talk knowing I was in the audience. My reputation for mouthing off about the minutiae of software made someone worry their material would risk setting me off. It’s one thing to know I might not get to meet someone because I put them off, it’s another to know they’re having a worse time on stage on my account.

Public speaking is regularly listed as many people’s greatest fear. As Adam Brault said in his talk, “your heroes are all scared, too.” It’s so true. I don’t know anyone who isn’t at least nervous about speaking in public. I know some remarkable speakers who have almost paralysing fears about what will happen on stage. I’ve seen friends pretty much pass out from exhaustion after presenting. Me, I feel nauseous for about an hour beforehand, I have to pee every five minutes, I sweat like crazy and I’m scared I’ll go into panic attack on stage. Oh and I stuttered as a child and it comes back when I’m not confident in what I’m saying. I don’t know why any of us choose to do it sometimes, and I have huge respect for anyone that even tries. I want the person on stage to know that, and not instead feel scared that I’m looking for ways to trip them up.

I’m ecstatic if I get one laugh during a whole talk. It’s the only way you know the audience is on your side.

Anyway, to get to the point. I make software and give it away and answer people’s questions about it because those are the best ways I have of helping people. I want people to expect that I’ll want to be helpful in real life, but that’s not the signal I send out sometimes. I’m sorry about that. Context matters, and I’m sure I’ll still indulge the people I know are comfortable with my snarkiness when I know I’m talking directly to them, just as good friends are allowed to insult each other without getting hurt. But I ought to remember I’ve not met most of the people who follow me, and some of them look up to me for my programming work. If you only know me from Twitter, you’re mostly seeing my angry side.

One of the biggest lessons I’ve taken from talking to people about community is that it pays to be up-front and explicit about what you expect from people and what they should expect from you, even when (or maybe especially when) those expectations seem obvious. In that spirit, I’m asking anyone that follows me online to tell me when I’m overstepping the line. Don’t hold back because I’m an ‘expert’ on something or because you think I’ll get defensive. Several people have done this in person in recent weeks and I was grateful to everyone one of them. I’m tired of being the angry programmer from the internet, so I have to change people’s expectations. Telling me when I’m getting it wrong helps that.

Hopefully next time I meet one of you, you won’t be so surprised.

Callbacks, promises and simplicity

Since publishing Callbacks are imperative, promises are functional a couple of days ago I’ve received a huge volume of feedback. It’s by far the most widely-read thing I’ve written and there have been far too many comments dotted about the web to cover them individually. But I’d like to highlight two articles written in response, both entitled ‘Broken promises’.

The first, by Drew Crawford, is an excellent and detailed technical analysis of the limitations of promises, in particular their applicability to resource-constrained situations. It’s derived from his experience trying to apply them to iOS programming, and failing. I encourage you to read the whole thing, but two important themes pop out for me.

First, the cost of modelling a solution using promises is increased memory usage. Callback-based programs (in JavaScript) store their state on the stack and in closures, since most data they use come from function arguments and local variables. The data they use tends to be quite fragmented; you don’t typically build large object graphs while using callbacks, and this can help keep memory usage down. On the other hand, good promise-based programs use relationships between values, encoded in arrays and object references, to solve problems. Thus they tend to produce larger data structures, and this can be a problem if your solution won’t fit in memory on your target runtime. If you hit a limit, you end up having to introduce manual control flow at which point callbacks become more appealing.

Second, promises let you delegate scheduling and timing to your tools, but sometimes that’s not what you want. Sometimes there are hard limits on how long something can take, or you want to bail out early for some reason, or you want to manually control ordering of things. Again, in such situations, manual control flow is the right tool and promises loose their appeal.

I didn’t mean to give the impression that promises are the one true solution to async programming (I try to avoid thinking of ‘one true’ anything, so Drew’s casting of me as an ‘evangelist’ is a little bruising). There are certainly plenty of situations where they’re not appropriate, and Drew catalogues a number of those very well. My only intention was to add a few more things to my mental toolbox so I can dig them out when I spot an appropriate occasion.

The second response is from Mikeal Rogers, whom I quote in my original article. (There’s also a more recent version of his talk.) The bulk of this article concerns the community effect of Node’s design decision. Again, you should read his article rather than take my summary for granted.

Mikeal argues that the success of the Node ecosystem relies on how little the core platform provides. Instead of shipping with a huge standard library, Node has a bare minimum of building blocks you need to get started writing network apps. One of those building blocks is the conventions it sets up: the implicit contracts that mean all the modules in the ecosystem interoperate with each other. These include the Stream interface, the module system, and the function(error, result) {} callback style. Libraries that adhere to these interoperate with one another easily, and this is tremendously valuable.

I actually commend the Node team for their attitude on this stuff. I’ve been using Node, and Faye has been built on it, since v0.1. I was around before promises were taken out of core, and I can see that the volume of disagreement on them at the time meant they shouldn’t have been standardized. And bizarrely, despite its early experimental status, I’ve found Node to be remarkably stable. I’ve been doing Ruby since 2006 and Node since early 2010, and I can honestly say Node has broken my stuff less while going from v0.1 to v0.10 than Ruby and its libraries have even during minor/patch releases. Paying attention to compatibility saves your users a huge amount of time and it’s something I try to do myself.

I recognize that the Node team have their hands somewhat tied. Even if they wanted to go back to promises, doing so would break almost every module in npm, which would be a terrible waste of everyone’s time. So in light of this, please take the following as a history lesson rather than a plea to change Node itself.

Mikeal’s argument goes that minimizing core creates a market for new ideas in the library ecosystem, and we’ve seen that with various control flow libraries. This is certainly true, but I think there’s an interesting difference where control flow is concerned, when compared to other types of problem. In most other popular web languages, the problem solved by the function(error, result) {} convention is solved at the language level: results are done with return values, errors with exceptions. There is no market for alternatives because this is usually a mechanism that cannot be meaningfully changed by users.

But I would also argue that the market for solutions to control flow in Node is necessarily constrained by what core does. Let’s look at a different problem. Say I’m in the market for a hashtable implementation for JavaScript. I can pick any implementation I like so long as it does the job, because this problem is not constrained by the rest of the ecosystem. All that matters is that it faithfully implements a hashtable and performs reasonably well. If I don’t like one, I can pick up another with a different API, and all I need to change is my code that uses the hashtable.

Most problems are like this: you just want a library that performs some task, and using it does not affect the rest of your system. But control flow is not like this: any control flow library out there must interoperate with APIs that use callbacks, which in Node means basically every library out there. So people trying to solve this problem do not have free reign over how to solve it, they are constrained by the ecosystem. No-one in the Ruby community thinks of return values and exceptions as being particularly constraining, but because in Node this concern is a library rather than a language feature, people have the freedom to try and change it. They just can’t change it very much, because they must remain compatible with everything else out there.

And to me that’s the history lesson: when you’re designing a platform, you want to minimize the number of contracts people must agree to in order to interoperate. When people perceive they have the ability to change things, but they really can’t, a tension arises because alternative solutions won’t mesh well with the ecosystem. The more concepts require these contracts, the fewer ways users can experiment with alternative models.

Mikeal also goes into the relative ease of use of callbacks and my promise-based solutions. I disagree with most of this but that’s not really important. We all consider different things to be easy, have different sets of knowledge, and are working on different problems with different requirements. That’s fine, so long as you don’t make technical decisions by simply pandering to the most ignorant. While we all need to write code that others can use and maintain, I hope part of that process involves trying to increase our collective knowledge rather than settling for what we know right now. Short of a usability study, any assertions I can make here about ease of use would be meaningless.

But I want to finish up on a word that’s frequently interpreted to mean ‘easy’: ‘simple’. I will assert that my promise-based solutions are simpler than using callbacks, but not in the ease-of-use sense. I mean in the sense that Rich Hickey uses in his talk Simple Made Easy, which is to say ‘not complex’, not having unrelated things woven together. This is an objective, or at least observable and demonstrable, property of a program, that we can examine by seeing how much you must change a program to change its results.

Let’s revisit my two solutions from the previous article, one written with callbacks and one with promises. This program calls fs.stat() on a collection of paths, then uses the size of the first file for some task and uses the whole collection of stats for some unrelated task. Here are the two solutions:

var paths = ['file1.txt', 'file2.txt', 'file3.txt'],
    file1 = paths.shift();

async.parallel([
  function(callback) {
    fs.stat(file1, function(error, stat) {
      // use stat.size
      callback(error, stat);
    });
  },
  function(callback) {
    async.map(paths, fs.stat, callback);
  }
], function(error, results) {
  var stats = [results[0]].concat(results[1]);
  // use the stats
});
var fs_stat = promisify(fs.stat);

var paths = ['file1.txt', 'file2.txt', 'file3.txt'],
    statsPromises = list(paths.map(fs_stat));

statsPromises[0].then(function(stat) {
  // use stat.size
});

statsPromises.then(function(stats) {
  // use the stats
});

Now, I would say the first is uglier than the second, but this is neither objective nor particularly instructive. To see how much more complex the first solution is, we must observe what happens when we try to change the program. Let’s say we no longer want to do the task with the size of the first file. Then the promise solution simply involves removing the code for that task:

var fs_stat = promisify(fs.stat);

var paths = ['file1.txt', 'file2.txt', 'file3.txt'],
    statsPromises = list(paths.map(fs_stat));

statsPromises.then(function(stats) {
  // use the stats
});

It would go similarly if we had a set of unrelated operations waiting to complete rather than the same operation on a set of inputs. We would just remove that operation from a list of promises and we’d be done. Often, changing promise-based solutions is the same as changing synchronous ones: you just remove a line, or change a variable reference or array index, or modify a data structure. These are changes to your program’s data, not to its syntactic structure.

Now let’s remove that task from the callback solution. We start by removing all the code that treats the first file as a special case:

var paths = ['file1.txt', 'file2.txt', 'file3.txt'];

async.parallel([
  function(callback) {
    async.map(paths, fs.stat, callback);
  }
], function(error, results) {
  var stats = results[0];
  // use the stats
});

We’ve removed the additional variable at the start, the special treatment of the first file, and the array-munging at the end. But there’s no point having an async.parallel() with one item in it, so let’s remove that too:

var paths = ['file1.txt', 'file2.txt', 'file3.txt'];

async.map(paths, fs.stat, function(error, stats) {
  // use the stats
});

So removing that one task was a tiny change to the promise solution, and a huge change to the callback solution: often, changing what you want a callback-based program to do involves changing its syntactic structure. The difference between the two approaches is that the promise solution keeps the notions of promises, collections and asynchrony and ordering separate from one another, whereas the callback-based solution conflates all these things. This is why the promise-based program requires much less change.

So while I think it’s fruitless to argue about how easy a task is, I think you can demonstrate fairly objectively how simple it is. And while I admire what Node has achieved in many areas around interoperability through simplicity, this is one area where I wish it had gone the other way. Fortunately, JavaScript is such that we have ways of routing around designs we don’t care for.

Thanks to Drew and Mikeal for taking the time to respond. I would welcome any further feedback about how to improve either of my above approaches.

Callbacks are imperative, promises are functional: Node’s biggest missed opportunity

The nature of promises is that they remain immune to changing circumstances.
Frank Underwood, ‘House of Cards’

You will often hear it said that JavaScript is a ‘functional’ programming language. It is described as such simply because functions are first-class values: many other features that define functional programming – immutable data, preference for recursion over looping, algebraic type systems, avoidance of side effects – are entirely absent. And while first-class functions are certainly useful, and enable users to program in functional style should they decide to, the notion that JS is functional often overlooks a core aspect of functional programming: programming with values.

‘Functional programming’ is something of a misnomer, in that it leads a lot of people to think of it as meaning ‘programming with functions’, as opposed to programming with objects. But if object-oriented programming treats everything as an object, functional programming treats everything as a value – not just functions, but everything. This of course includes obvious things like numbers, strings, lists, and other data, but also other things we OOP fans don’t typically think of as values: IO operations and other side effects, GUI event streams, null checks, even the notion of sequencing function calls. If you’ve ever heard the phrase ‘programmable semicolons’ you’ll know what I’m getting at.

At its best, functional programming is declarative. In imperative programming, we write sequences of instructions that tell the machine how to do what we want. In functional programming, we describe relationships between values that tell the machine what we want to compute, and the machine figures out the instruction sequences to make it happen.

If you’ve ever used Excel, you have done functional programming: you model a problem by describing how a graph of values are derived from one another. When new data is inserted, Excel figures out what effect it has on the graph and updates everything for you, without you having to write a sequence of instructions for doing so.

With this definition in place, I want to address what I consider to be the biggest design mistake committed by Node.js: the decision, made quite early in its life, to prefer callback-based APIs to promise-based ones.

Everybody uses [callbacks]. If you publish a module that returns promises, nobody’s going to care. Nobody’s ever going to use that module.

If I write my little library, and it goes and talks to Redis, and that’s the last thing it ever does, I can just pass the callback that was handed to me off to Redis. And when we do hit these problems like callback hell, I’ll tell you a secret: there’s also a coroutine hell and a monad hell and a hell for any abstraction you create if you use it enough.

For the 90% case we have this super super simple interface, so when you need to do one thing, you just get one little indent and then you’re done. And when you have a complicated use case you go and install async like the other 827 modules that depend on it in npm.

Mikeal Rogers, LXJS 2012

This quotation is from a recent(ish) talk by Mikeal Rogers, which covers various facets of Node’s design philosophy:

In light of Node’s stated design goal of making it easy for non-expert programmers to build fast concurrent network programs, I believe this attitude to be counterproductive. Promises make it easier to construct correct, maximally concurrent programs by making control-flow something for the runtime to figure out, rather than something the user has to explicitly implement.

Writing correct concurrent programs basically comes down achieving as much concurrent work as you can while making sure operations still happen in the correct order. Although JavaScript is single threaded, we still get race conditions due to asynchrony: any action that involves I/O can yield CPU time to other actions while it waits for callbacks. Multiple concurrent actions can access the same in-memory data, or carry out overlapping sequences of commands against a database or the DOM. As I hope to show in this article, promises provide a way to describe problems using interdependencies between values, like in Excel, so that your tools can correctly optimize the solution for you, instead of you having to figure out control flow for yourself.

I hope to dismiss the misunderstanding that promises are about having cleaner syntax for callback-based async work. They are about modeling your problem in a fundamentally different way; they go deeper than syntax and actually change the way you solve problems at a semantic level.

To begin with, I’d like to revisit an article I wrote a couple of years ago, on how promises are the monad of asynchronous programming. The core lesson there was that monads are a tool for helping you compose functions, i.e. building pipelines where the output of one function becomes the input to the next. This is achieved using structural relationships between values, and it’s values and their relationships that will again play an important role here.

I’m going to make use of Haskell type notation again to help illustrate things. In Haskell, the notation foo :: bar means “the value foo is of type bar“. The notation foo :: Bar -> Qux means “foo is a function that takes a value of type Bar and returns a value of type Qux“. If the exact types of the input/output are not important, we use single lowercase letters, foo :: a -> b. If foo takes many arguments we add more arrows, i.e. foo :: a -> b -> c means that foo takes two arguments of types a and b and returns something of type c.

Let’s look at a Node function, say, fs.readFile(). This takes a pathname as a String, and a callback, and does not return anything. The callback takes an Error (which might be null) and a Buffer containing the file contents, and also returns nothing. We can say the type of readFile is:

readFile :: String -> Callback -> ()

() is Haskell notation for the null type. The callback is itself another function, whose type signature is:

Callback :: Error -> Buffer -> ()

Putting the whole thing together, we can say that readFile takes a String and a function which is called with a Buffer:

readFile :: String -> (Error -> Buffer -> ()) -> ()

Now, let’s imagine Node used promises instead. In this situation, readFile would simply take a String and return a promise of a Buffer:

readFile :: String -> Promise Buffer

More generally, we can say that callback-based functions take some input and a callback that’s invoked with some output, and promised-based functions take some input and return a promise of some output:

callback :: a -> (Error -> b -> ()) -> ()
promise :: a -> Promise b

Those null values returned by callback-based functions are the root of why programming with callbacks is hard: callback-based functions do not return anything, and so are hard to compose. A function with no return value is executed only for its side effects – a function with no return value or side effects is simply a black hole. So programming with callbacks is inherently imperative, it is about sequencing the execution of side-effect-heavy procedures rather than mapping input to output by function application. It is about manual orchestration of control flow rather than solving problems through value relationships. It is this that makes writing correct concurrent programs difficult.

By contrast, promise-based functions always let you treat the result of the function as a value in a time-independent way. When you invoke a callback-based function, there is some time between you invoking the function and its callback being invoked during which there is no representation of the result anywhere in the program.

fs.readFile('file1.txt',
  // some time passes...
  function(error, buffer) {
    // the result now pops into existence
  }
);

Getting the result out of a callback- or event-based function basically means “being in the right place at the right time”. If you bind your event listener after the result event has been fired, or you don’t have code in the right place in a callback, then tough luck, you missed the result. This sort of thing plagues people writing HTTP servers in Node. If you don’t get your control flow right, your program breaks.

Promises, on the other hand, don’t care about time or ordering. You can attach listeners to a promise before or after it is resolved, and you will get the value out of it. Therefore, functions that return promises immediately give you a value to represent the result that you can use as first-class data, and pass to other functions. There is no waiting around for a callback or any possibility of missing an event. As long as you hold a reference to a promise, you can get its value out.

var p1 = new Promise();
p1.then(console.log);
p1.resolve(42);

var p2 = new Promise();
p2.resolve(2013);
p2.then(console.log);

// prints:
// 42
// 2013

So while the method name then() implies something about sequencing operations – and indeed that is a side-effect of its job – you can really think of it as being called unwrap. A promise is a container for an as-yet-unknown value, and then‘s job is to extract the value out of the promise and give it to another function: it is the bind function from monads. The above code doesn’t say anything about when the value is available, or what order things happen in, it simply expresses some dependencies: in order to log a value, you must first know what it is. The ordering of the program emerges from this dependency information. This is a rather subtle distinction but we’ll see it more clearly when we discuss lazy promises toward the end of this article.

Thus far, this has all been rather trivial; little functions that barely interact with one another. To see why promises are more powerful, let’s tackle something a bit tricker. Say we have some code that gets the mtimes of a bunch of files using fs.stat(). If this were synchronous, we’d just call paths.map(fs.stat), but since mapping with an async function is hard, we dig out the async module.

var async = require('async'),
    fs    = require('fs');

var paths = ['file1.txt', 'file2.txt', 'file3.txt'];

async.map(paths, fs.stat, function(error, results) {
  // use the results
});

(Yes, I know there are sync versions of the fs functions, but most types of I/O don’t have this option. Play along with me, here.)

That’s all well and good, until we decide we also want the size of file1 for an unrelated task. We could just stat it again:

var paths = ['file1.txt', 'file2.txt', 'file3.txt'];

async.map(paths, fs.stat, function(error, results) {
  // use the results
});

fs.stat(paths[0], function(error, stat) {
  // use stat.size
});

That works, but now we’re statting the file twice. That might be fine for local file operations, but if we were fetching some large files over https that’s going to be more of a problem. We decide we need to only hit the file once, so we revert to the previous version but handle the first file specially:

var paths = ['file1.txt', 'file2.txt', 'file3.txt'];

async.map(paths, fs.stat, function(error, results) {
  var size = results[0].size;
  // use size
  // use the results
});

This works, but now our size-related task is blocked on waiting for the whole list to complete. And if there’s an error with any item in the list, we won’t get a result for the first file at all. That’s no good, so we try another approach: we separate the first file from the rest of the list and handle it separately.

var paths = ['file1.txt', 'file2.txt', 'file3.txt'],
    file1 = paths.shift();

fs.stat(file1, function(error, stat) {
  // use stat.size
  async.map(paths, fs.stat, function(error, results) {
    results.unshift(stat);
    // use the results
  });
});

This also works, but now we’ve un-parallelized the program: it will take longer to run because we don’t start on the list of requests until the first one is complete. Previously, they all ran concurrently. We’ve also had to do some array manipulation to account for the fact we’re treating one file differently from the others.

Okay, one last stab at success. We know we want to get the stats for all the files, hitting each file only once, do some work on the first result if it succeeds, and if the whole list succeeds we want to do some work on that list. We take this knowledge of the dependencies in the problem and express it using async.

var paths = ['file1.txt', 'file2.txt', 'file3.txt'],
    file1 = paths.shift();

async.parallel([
  function(callback) {
    fs.stat(file1, function(error, stat) {
      // use stat.size
      callback(error, stat);
    });
  },
  function(callback) {
    async.map(paths, fs.stat, callback);
  }
], function(error, results) {
  var stats = [results[0]].concat(results[1]);
  // use the stats
});

This is now correct: each file is hit once, the work is all done in parallel, we can access the result for the first file independently of the others, and the dependent tasks execute as early as possible. Mission accomplished!

Well, not really. I think this is pretty ugly, and it certainly doesn’t scale nicely as the problem becomes more complicated. It was a lot of work to think about in order to make it correct, the design intention is not apparent so later maintenance is likely to break it, the follow-up tasks are mixed in with the strategy of how to do the required work, and we had to so some crufty array-munging to paper over the special case we introduced. Yuck!

All these problems stem from the fact that we’re using control flow as our primary means of solving the problem, instead of data dependencies. Instead of saying “in order for this task to run, I need this data”, and letting the runtime figure out how to optimize things, we’re explicitly telling the runtime what should be parallelized and what should be sequential, and this leads to very brittle solutions.

So how would promises improve things? Well, first off we need some filesystem functions that return promises instead of taking callbacks. But rather than write those by hand let’s meta-program something that can convert any function for us. For example, it should take a function of type

String -> (Error -> Stat -> ()) -> ()

and return one of type

String -> Promise Stat

Here’s one such function:

// promisify :: (a -> (Error -> b -> ()) -> ()) -> (a -> Promise b)
var promisify = function(fn, receiver) {
  return function() {
    var slice   = Array.prototype.slice,
        args    = slice.call(arguments, 0, fn.length - 1),
        promise = new Promise();
    
    args.push(function() {
      var results = slice.call(arguments),
          error   = results.shift();
      
      if (error) promise.reject(error);
      else promise.resolve.apply(promise, results);
    });
    
    fn.apply(receiver, args);
    return promise;
  };
};

(This is not completely general, but it will work for our purposes.)

We can now remodel our problem. All we’re basically doing is mapping a list of paths to a list of promises for stats:

var fs_stat = promisify(fs.stat);

var paths = ['file1.txt', 'file2.txt', 'file3.txt'];

// [String] -> [Promise Stat]
var statsPromises = paths.map(fs_stat);

This is already paying dividends: whereas with async.map() you have no data to work with until the whole list is done, with this list of promises you can just pick out the first one and do stuff with it:

statsPromises[0].then(function(stat) { /* use stat.size */ });

So by using promise values we’ve already solved most of the problem: we stat all the files concurrently and get independent access to not just the first file, but any file we choose, simply by picking bits out of the array. With our previous approaches we had to explicitly code for handling the first file in ways that don’t map trivially to changing your mind about which file you need, but with lists of promises it’s easy.

The missing piece is how to react when all the stat results are known. In our previous efforts we ended up with a list of Stat objects, but here we have a list of Promise Stat objects. We want to wait for all the promises to resolve, and then yield a list of all the stats. In other words, we want to turn a list of promises into a promise of a list.

Let’s do this by simply augmenting the list with promise methods, so that a list containing promises is itself a promise that resolves when all its elements are resolved.

// list :: [Promise a] -> Promise [a]
var list = function(promises) {
  var listPromise = new Promise();
  for (var k in listPromise) promises[k] = listPromise[k];
  
  var results = [], done = 0;
  
  promises.forEach(function(promise, i) {
    promise.then(function(result) {
      results[i] = result;
      done += 1;
      if (done === promises.length) promises.resolve(results);
    }, function(error) {
      promises.reject(error);
    });
  });
  
  if (promises.length === 0) promises.resolve(results);
  return promises;
};

(This function is similar to the jQuery.when() function, which takes a list of promises and returns a new promise that resolves when all the inputs resolve.)

We can now wait for all the results to come in just by wrapping our array in a promise:

list(statsPromises).then(function(stats) { /* use the stats */ });

So now our whole solution has been reduced to this:

var fs_stat = promisify(fs.stat);

var paths = ['file1.txt', 'file2.txt', 'file3.txt'],
    statsPromises = list(paths.map(fs_stat));

statsPromises[0].then(function(stat) {
  // use stat.size
});

statsPromises.then(function(stats) {
  // use the stats
});

This expression of the solution is considerably cleaner. By using some generic bits of glue (our promise helper functions), and pre-existing array methods, we’ve solved the problem in a way that’s correct, efficient, and very easy to change. We don’t need the async module’s specialized collection methods for this, we just take the orthogonal ideas of promises and arrays and combine them in a very powerful way.

Note in particular how this program does not say anything about things being parallel or sequential. It just says what we want to do, and what the task dependencies are, and the promise library does all the optimizing for us.

In fact, many things in the async collection module can be easily replaced with operations on lists of promises. We’ve already seen how map works; this code:

async.map(inputs, fn, function(error, results) {});

is equivalent to:

list(inputs.map(promisify(fn))).then(
    function(results) {},
    function(error) {}
);

async.each() is just async.map() where you’re executing the functions for their side effects and throwing the return values away; you can just use map() instead.

async.mapSeries() (and by the previous argument, async.eachSeries()) is equivalent to calling reduce() on a list of promises. That is, you take your list of inputs, and use reduce to produce a promise where each action depends on the one before it succeeding. Let’s take an example: implementing an equivalent of rm -rf based on fs.rmdir(). This code:

var dirs = ['a/b/c', 'a/b', 'a'];
async.mapSeries(dirs, fs.rmdir, function(error) {});

is equivalent to:


var dirs     = ['a/b/c', 'a/b', 'a'],
    fs_rmdir = promisify(fs.rmdir);

var rm_rf = dirs.reduce(function(promise, path) {
  return promise.then(function() { return fs_rmdir(path) });
}, unit());

rm_rf.then(
    function() {},
    function(error) {}
);

Where unit() is simply a function that produces an already-resolved promise to start the chain (if you know monads, this is the return function for promises):

// unit :: a -> Promise a
var unit = function(a) {
  var promise = new Promise();
  promise.resolve(a);
  return promise;
};

This reduce() approach simply takes each pair of subsequent directory paths in the list, and uses promise.then() to make the action to delete the path depend on the success of the previous step. This handles non-empty directories for you: if the previous promise is rejected due to any such error, the chain simply halts. Using value dependencies to force a certain order of execution is a core idea in how function languages use monads to deal with side effects.

This final example is more verbose than the equivalent async code, but don’t let that deceive you. The key idea here is that we’re combining the separate ideas of promise values and list operations to compose programs, rather than relying on custom control flow libraries. As we saw earlier, the former approach results in programs that are easier to think about.

And they are easier to think about precisely because we’ve delegated part of our thought process to the machine. When using the async module, our thought process is:

  • A. The tasks in this program depend on each other like so,
  • B. Therefore the operations must be ordered like so,
  • C. Therefore let’s write code to express B.

Using graphs of dependent promises lets you skip step B altogether. You write code that expresses the task dependencies and let the computer deal with control flow. To put it another way, callbacks use explicit control flow to glue many small values together, whereas promises use explicit value relationships to glue many small bits of control flow together. Callbacks are imperative, promises are functional.

A discussion of this topic would not be complete without one final application of promises, and a core idea in functional programming: laziness. Haskell is a lazy language, which means that instead of treating your program as a script that it executes top-to-bottom, it starts at the expressions that define the program’s output – what it writes to stdio, databases, and so on – and works backwards. It looks at what expressions those final expressions depend on for their input, and walks this graph in reverse until it’s computed everything the program needs to produce its output. Things are only computed if they are needed for the program to do its work.

Many times, the best solution to a computer science problem comes from finding the right data structure to model it. And JavaScript has one problem very similar to what I just described: module loading. You only want to load modules your program actually needs, and you want to do this as efficiently as possible.

Before we had CommonJS and AMD that actually have a notion of dependencies, we had a handful of script loader libraries. They mostly worked much like our example above where you explicitly told the script loader which files could be downloaded in parallel and which had to be ordered a certain way. You basically had to spell out the download strategy, which is considerably harder to do both correctly and efficiently as opposed to simply describing the dependencies between scripts and letting the loader optimize things for you.

Let’s introduce the notion of a LazyPromise. This is a promise object that contains a function that does some possibly async work. The function is only invoked once someone calls then() on the promise: we only begin evaluating it once someone needs the result. It does this by overriding then() to kick off the work if it’s not already been started.

var Promise = require('rsvp').Promise,
    util    = require('util');

var LazyPromise = function(factory) {
  this._factory = factory;
  this._started = false;
};
util.inherits(LazyPromise, Promise);

LazyPromise.prototype.then = function() {
  if (!this._started) {
    this._started = true;
    var self = this;
    
    this._factory(function(error, result) {
      if (error) self.reject(error);
      else self.resolve(result);
    });
  }
  return Promise.prototype.then.apply(this, arguments);
};

For example, the following program does nothing: since we never ask for the result of the promise, no work is done:

var delayed = new LazyPromise(function(callback) {
  console.log('Started');
  setTimeout(function() {
    console.log('Done');
    callback(null, 42);
  }, 1000);
});

But if we add this line, then the program prints Started, waits for a second, then prints Done followed by 42:

delayed.then(console.log);

And since the work is only done once, calling then() yields the result multiple times but does not do the work over each time:

delayed.then(console.log);
delayed.then(console.log);
delayed.then(console.log);

// prints:
// Started
// -- 1 second delay --
// Done
// 42
// 42
// 42

Using this very simple generic abstraction, we can build an optimizing module system in no time at all. Imagine we want to make a bunch of modules like this: each module is created with a name, a list of modules it depends on, and a factory that when executed with its dependencies passed in returns the module’s API. This is very similar to how AMD works.

var A = new Module('A', [], function() {
  return {
    logBase: function(x, y) {
      return Math.log(x) / Math.log(y);
    }
  };
});

var B = new Module('B', [A], function(a) {
  return {
    doMath: function(x, y) {
      return 'B result is: ' + a.logBase(x, y);
    }
  };
});

var C = new Module('C', [A], function(a) {
  return {
    doMath: function(x, y) {
      return 'C result is: ' + a.logBase(y, x);
    }
  };
});

var D = new Module('D', [B, C], function(b, c) {
  return {
    run: function(x, y) {
      console.log(b.doMath(x, y));
      console.log(c.doMath(x, y));
    }
  };
});

We have a diamond shape here: D depends on B and C, each of which depends on A. This means we can load A, then B and C in parallel, then when both those are done we can load D. But, we want our tools to figure this out for us rather than write that strategy out ourselves.

We can do this very easily by modeling a module as a LazyPromise subtype. Its factory simply asks for the values of its dependencies using our list promise helper from before, then creates the module with those dependencies after a timeout that simulates the latency of loading things asynchronously.

var DELAY = 1000;

var Module = function(name, deps, factory) {
  this._factory = function(callback) {
    list(deps).then(function(apis) {
      console.log('-- module LOAD: ' + name);
      setTimeout(function() {
        console.log('-- module done: ' + name);
        var api = factory.apply(this, apis);
        callback(null, api);
      }, DELAY);
    });
  };
};
util.inherits(Module, LazyPromise);

Because Module is a LazyPromise, simply defining the modules as above does not load any of them. We only start loading things when we try to use the modules:

D.then(function(d) { d.run(1000, 2) });

// prints:
// 
// -- module LOAD: A
// -- module done: A
// -- module LOAD: B
// -- module LOAD: C
// -- module done: B
// -- module done: C
// -- module LOAD: D
// -- module done: D
// B result is: 9.965784284662087
// C result is: 0.10034333188799373

As you can see, A is loaded first, then when it completes B and C begin downloading at the same time, and when both of them complete then D loads, just as we wanted. If you try just calling C.then(function() {}) you’ll see that only A and C load; modules that are not in the graph of the ones we need are not loaded.

So we’ve created a correct optimizing module loader with barely any code, simply by using a graph of lazy promises. We’ve taken the functional programming approach of using value relationships rather than explicit control flow to solve the problem, and it was much easier than if we’d written the control flow ourselves as the main element in the solution. You can give any acyclic dependency graph to this library and it will optimize the control flow for you.

This is the real power of promises. They are not just a way to avoid a pyramid of indentation at the syntax level. They give you an abstraction that lets you model problems at a higher level, and leave more work to your tools. And really, that’s what we should all be demanding from our software. If Node is really serious about making concurrent programming easy, they should really give promises a second look.

Announcing reStore, a personal remoteStorage server for Node

If you’ve been following my Vault project, you might have heard me talk about how I’m going to support saving your password settings while using the web version. There is currently no means of doing this baked into Vault itself; it relies on your saving your settings somewhere yourself, and doesn’t integrate with the command-line version.

Obviously, this needs fixing, and I’ve actually been working on fixing it since before the first stable Vault release came out. I began looking for some way of supporting this feature that didn’t involve me storing people’s data on my servers, for fairly obvious security and logistical reasons. Back in May last year I did a little digging and stumbled on the remoteStorage project, part of the Unhosted collection of web technologies. It looked like just what I needed to support Vault: it lets web applications delegate storage to a server under the user’s control. The user gives the application their username and server as an email address, the application does an OAuth handshake with the server, and can then use it to store the user’s data.

This has two important benefits: obviously, it gives users freedom of movement between software providers: several standard modules are being worked out for various common things people store, like contacts, photos and so on. But secondly, decoupling storage from software lets organisations use public web applications but keep their data off the public internet: as long as JavaScript in the browser can talk to a server visible on your private network, it just works. This is how I’m planning on making getvau.lt usable by teams and organisations.

So, in support of this aim, I’m releasing the remoteStorage server I’ve been working on for the last few months as open-source software. It’s called reStore, and you can get it from npm:

$ npm install restore

The documentation on GitHub should tell you everything you need to get started. Right now, I want to show you what it’s like to use from an application’s point of view. It’s really very simple to talk to, in fact I have unreleased code for Vault that integrates it seamlessly into the web and command-line versions. This demo uses 5apps.com, another remoteStorage provider:

Vault: syncing data with remoteStorage from James Coglan on Vimeo.

An app starts off by asking the user for your remoteStorage address. Say I give it me@local.dev. The application then goes and queries that server using WebFinger to discover where the authorization and storage endpoints are:

$ curl -iX GET https://local.dev/.well-known/host-meta.json?resource=acct:me@local.dev
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json

{
  "links": [
    {
      "href": "https://local.dev/storage/me",
      "rel": "remotestorage",
      "type": "draft-dejong-remotestorage-00",
      "properties": {
        "auth-method": "http://tools.ietf.org/html/rfc6749#section-4.2",
        "auth-endpoint": "https://local.dev/oauth/me"
      }
    }
  ]
}

So the app now knows the root of my storage tree is at https://local.dev/storage/me, and it redirects me to https://local.dev/oauth/me to authorize it for the directories it wants to store things in. For reStore, that looks like this, for example if the app wants read/write access to my /vault directory:

reStore

The user enters their password if they want to grant access to the directories the app wants, and reStore then redirects to the app’s OAuth callback endpoint, passing a bearer token, say df6VCO3jTVZOCbHUMB2uLmmqL5M=. The app can now use this token to store any data it likes from the directory I authorized:

$ curl -iH 'Authorization: Bearer df6VCO3jTVZOCbHUMB2uLmmqL5M=' \
       -X PUT https://local.dev/storage/me/vault/foo \
       -H 'Content-Type: text/plain' -d 'Hello, world!'                                

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
ETag: 1361834896000
Last-Modified: Mon, 25 Feb 2013 23:28:16 GMT

$ curl -iH 'Authorization: Bearer df6VCO3jTVZOCbHUMB2uLmmqL5M=' \
       -X PUT https://local.dev/storage/me/vault/nested/bar \
       -H 'Content-Type: application/json' -d '{"hello":"world"}'

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
ETag: 1361834933000
Last-Modified: Mon, 25 Feb 2013 23:28:53 GMT

It can list the contents of authorized directories:

$ curl -iH 'Authorization: Bearer df6VCO3jTVZOCbHUMB2uLmmqL5M=' \
       -X GET https://local.dev/storage/me/vault/

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Content-Length: 54

{
  "nested/": 1361834933000,
  "foo": 1361834896000
}

$ curl -iH 'Authorization: Bearer df6VCO3jTVZOCbHUMB2uLmmqL5M=' \
       -X GET https://local.dev/storage/me/vault/nested/

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Content-Length: 26

{
  "bar": 1361834933000
}

And retrieve individual documents:

$ curl -iH 'Authorization: Bearer df6VCO3jTVZOCbHUMB2uLmmqL5M=' \
       -X GET https://local.dev/storage/me/vault/foo

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: text/plain
ETag: 1361834896000
Last-Modified: Mon, 25 Feb 2013 23:28:16 GMT
Content-Length: 13

Hello, world!

$ curl -iH 'Authorization: Bearer df6VCO3jTVZOCbHUMB2uLmmqL5M=' \
       -X GET https://local.dev/storage/me/vault/nested/bar

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
ETag: 1361834933000
Last-Modified: Mon, 25 Feb 2013 23:28:53 GMT
Content-Length: 17

{"hello":"world"}

When you retrieve a document you get the content-type and last-modified time and etag of the document, and you can then use these with if-unmodified-since when making PUT and DELETE requests. For example, here’s a failed DELETE request with a conflict:

$ curl -iH 'Authorization: Bearer df6VCO3jTVZOCbHUMB2uLmmqL5M=' \
       -X DELETE https://local.dev/storage/me/vault/nested/bar \
       -H 'If-Unmodified-Since: Sat, 25 Feb 2012 23:28:53 GMT'

HTTP/1.1 409 Conflict
Access-Control-Allow-Origin: *

Of course, if you don’t specify the version the document is simply deleted:


$ curl -iH 'Authorization: Bearer df6VCO3jTVZOCbHUMB2uLmmqL5M=' \
       -X DELETE https://local.dev/storage/me/vault/nested/bar

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
ETag: 1361834933000
Last-Modified: Mon, 25 Feb 2013 23:28:53 GMT

$ curl -iH 'Authorization: Bearer df6VCO3jTVZOCbHUMB2uLmmqL5M=' \
       -X GET https://local.dev/storage/me/vault/nested/bar

HTTP/1.1 404 Not Found
Access-Control-Allow-Origin: *

That’s really all there is to it – WebFinger, OAuth and GET/PUT/DELETE. Since this is the first release of reStore it should go without saying it is alpha software, and you should expect it to eat your data – please take regular backups if you use it to store data. I’m releasing it for evaluation and while I work on rolling out for my own uses, and I’d really appreciate any feedback you have.

Tab-completion for your command-line apps

One of the usability improvements in the latest version of Vault is that it supports tab-completion for options and saved service names, on both bash and zsh. Turns out this is really easy to do, and I’d like more Node/Ruby/whatever command-line apps to support it, so here’s how.

(I cribbed almost everything I know about this from rbenv. Sometimes the hardest thing about shell scripting is knowing what to google for, so when in doubt: dig through the scripts of a program you use a lot.)

Bash and zsh have different tab-completion systems with a large array of features, and you can get quite fancy with them (read the completion scripts for git if you don’t believe me). But for really basic use they let you do basically the same thing. You register a function to be called when the user tries to tab-complete an argument to your program. Here’s what Vault’s scripts for this look like, first bash:

# completion.bash

_vault_complete() {
  COMPREPLY=()
  local word="${COMP_WORDS[COMP_CWORD]}"
  local completions="$(vault --cmplt "$word")"
  COMPREPLY=( $(compgen -W "$completions" -- "$word") )
}

complete -f -F _vault_complete vault

And zsh:

# completion.zsh

_vault_complete() {
  local word completions
  word="$1"
  completions="$(vault --cmplt "${word}")"
  reply=( "${(ps:\n:)completions}" )
}

compctl -f -K _vault_complete vault

The important hooks are these lines:

# bash
complete -f -F _vault_complete vault

# zsh
compctl -f -K _vault_complete vault

These do basically the same thing: they say that when the first word in the command-line is vault, call the function _vault_complete to perform completion (-F for bash, -K for zsh), and also allow filenames to be used as completions (the -f flag).

So that’s how you register completion functions, but how do they work? Well, first they get the current shell word: this is "${COMP_WORDS[COMP_CWORD]}" in bash, and "$1" in zsh. They pass this word to Vault, as vault --cmplt WORD. This is just a special argument to the vault executable that takes a partially-complete word and prints a list of possible completions of it to stdout, separated by newlines. You can implement this however you want; Vault reads your config file and reflects on its own command-line flags to generate completions. You don’t even have to filter the possible results yourself for those that match the input word, the shell can do this for you. (It may be advantageous if doing filtering yourself reduces the time it takes to find completions for the input.)

The output of vault --cmplt gets stored in the variable completions, and then a little post-processing takes place. In bash we do this:

COMPREPLY=( $(compgen -W "$completions" -- "$word") )

compgen is what filters your list of completions for those that actually match the input word, and bash expects the final completion result to be stored in the special variable COMPREPLY. Whatever ends up there is what will be used to complete the user’s input.

In zsh we have this:

reply=( "${(ps:\n:)completions}" )

This is saying, split the value of completions on newlines, and store the resulting list in reply, which is where zsh expects completions to end up.

So that’s all there is to doing basic completion. You’ll need to provide a way for the user to actually load these functions into their shell conveniently. Vault does this using a script that detects which shell you’re using and loads the right hooks:

# init.sh

if [ -n "$BASH_VERSION" ]; then
  root="$(dirname "${BASH_SOURCE[0]}")"
  source "$root/completion.bash"

elif [ -n "$ZSH_VERSION" ]; then
  root="$(dirname "$0")"
  source "$root/completion.zsh"
fi

This script, and the two completion scripts, live side-by-side in the Vault source tree. The final bit of glue is that Vault has a command called vault --initpath, which returns the full path to init.sh. This is because finding the path to an installed library can be tricky, so it’s easier to just have the executable be able to tell you where its scripts are, which in Node can easily be done using the __dirname variable.

This lets the user drop this in their profile to load your completion scripts:

which vault > /dev/null && . "$( vault --initpath )"

See? Couldn’t be easier. If you’re building a command-line program this is a really easy usability win that makes interacting with the program far more pleasant.

Vault 0.3: improved settings management and private-key-based passwords

I’m happy to announce a new release of Vault, my password manager. (For background on why I’m doing this project, see the original release announcement.)

Before you install it, you need to know that the encryption used to keep your settings file safe has changed and you need to migrate. Make a backup of your .vault file, then export your settings as plaintext:

$ vault --export vault.json

Then, install the update and import your settings again:

$ npm update -g vault
$ vault --import vault.json

Your settings should now work just as before. Now onto the new features. First, Vault now supports tab-completion for options and service names under bash and zsh. Just add this to your profile:

which vault > /dev/null && . "$( vault --initpath )"

Second, you can now delete your settings easily from the command line. The following new options are available:

  • --delete SERVICE, -x SERVICE: deletes the settings for the named service
  • --delete-globals: deletes your global settings (i.e. settings created with --config and no service name)
  • --clear, -X: deletes all your settings

Finally, you can now generate passwords using your SSH private key. This works by, instead of taking a passphrase, using your private key to sign the service name, and using the result bits as input for the generator. So to generate your gmail password from your private key, run:

$ vault --key gmail

If you have multiple SSH keys you will be prompted for which one you want to use. You can save your selection like this:

$ vault --config --key

This will store the public part of the selected key in your .vault file so we can ask ssh-agent for it next time you need it.

That about covers the new features. See GitHib for full documentation. The next release I’m working on the moment involves adding a storage backend based on remotestorage, which I’ve been working on an open-source server for. This will mean you’ll be able to use your saved settings on the web, not just on the command line, using a server under your control. I am planning on rolling this out at my company and getting it suitable for team use; there’s still a way to go on that but I’m making good progress and a clear plan for where we need to get to. If you’d like early beta access to this, please let me know.

Terminus 0.5: now with Capybara 2.0 support and remote debugging

You might remember from my previous post that Terminus is a Capybara driver that lets you run your integration tests on any web browser on any device. Well, in mid-November Capybara 2.0 came out, and since I was at the excellent RuPy conference at the time, my conf hack project became getting Terminus compatible with this new release.

I almost finished it that weekend, but not quite, and as always once you’re home and back at work you lose focus on side projects. But, for my final release of 2012, I can happily announce Terminus 0.5 is released, and makes Terminus compatible with both Capybara 1.1 and 2.0. It’s mostly a compatibility update but it adds a couple of new features. First, Capybara’s screenshot API is supported when running tests with PhantomJS:

page.save_screenshot('screenshot.png')

And, it supports the PhantomJS remote debugger. You can call this API:

page.driver.debugger

This will pause the test execution, and open the WebKit remote debugger in Chrome so you can interact with the PhantomJS runtime through the WebKit developer tools. When testing on other browsers it simply pauses execution so you can inspect the browser where the tests are running.

As usual, ping the GitHub project if you find bugs.

Happy new year!

Terminus 0.4: Capybara for real browsers

As I occasionally mention, the original reason I built Faye was so I could control web browsers with Ruby. The end result was Terminus, a Capybara driver that controls real browsers. Since the last release, various improvements in Faye – including the extracted WebSocket module, removal of the Redis dependency and overall performance gains – have made various improvements to Terminus possible. Since Faye’s 0.8 release, I’ve been working on Terminus on-and-off and can now finally release version 0.4.

Terminus is a driver designed to control any browser on any device. To that end, this release adds support for the headless PhantomJS browser, as well as Android and IE8. In combination with the performance improvements, this makes Terminus a great option for headless and mobile testing. The interesting thing about Android and IE is that they do not support the document.evaluate() method for querying the DOM using XPath, and Capybara gives XPath queries to the driver to execute. In order to support these browsers, I had to write an XPath library, and in order to get that done quickly I wrote a PEG parser compiler. So that’s now three separate side projects that have sprung out of Terminus – talk about yak shaving.

But the big change in 0.4 is speed: Terminus 0.4 runs the Capybara test suite 3 to 5 times faster than 0.3 did. It does this using some trickery from Jon Leighton’s excellent Poltergeist dirver, which just got to 1.0. Here’s how Terminus usually talks to the browser: first, the browser connects to a running terminus server using Faye, and sends ping messages to advertise its presence:

        +---------+
        | Browser |
        +---------+
             |
             | ping
             V
        +---------+
        | Server  |
        +---------+

When you start your tests, the Terminus library connects to the server, discovers which browsers exist, and sends instructions to them. The browser executes the instructions and sends the results back to the Terminus library via the server.

        +---------+
        | Browser |
        +---------+
            ^  |
   commands |  | results
            |  V
        +---------+           +-------+
        | Server  |< -------->| Tests |
        +---------+           +-------+

As you can guess, the overhead of two socket connections and a pub/sub messaging protocol makes this a little slow. This is where the Poltergeist trick comes in. If the browser supports WebSocket, the Terminus library will boot a blocking WebSocket server in your test process, and wait for the browser to connect to it. It can then use this socket to perform request/response to the browser – it sends a message over the socket and blocks until the browser sends a response. This turns out to be much faster than using Faye and running sleep() in a loop until a result arrives.

        +---------+
        | Browser |< -------------+
        +---------+               |
            ^  |                  | queries
   commands |  | results          |
            |  V                  V
        +---------+           +-------+
        | Server  |< -------->| Tests |
        +---------+           +-------+

The Faye connection is still used to advertise the browser’s existence and to bootstrap the connection, since it’s guaranteed to work whatever browser or network you’re on.

The cool thing about this is that Jon’s code reuses the Faye::WebSocket protocol parser, supporting both hixie-76 and hybi protocols, on a totally different I/O stack. Though Faye::WebSocket is written for EventMachine, I did try to keep the parser decoupled but had never actually tried to use it elsewhere, so it’s really nice to see it used like this.

Anyway, if you’re curious about Terminus you can find out more on the website.

Faye 0.8.4: more efficient socket connections

Update: Version 0.8.5 was released shortly afterward to fix a URL parsing bug in this release.

I just released Faye 0.8.4, a drop-in replacement for previous releases. It includes various little fixes, including working around iOS’s new POST-caching bug, making sure JSON-P requests don’t exceed URL size limits, checking EventSource actually works to detect broken releases of Opera, and fixing relative URL resolution in Internet Explorer. But the biggest improvement is in how it negotiates which transport to use. TL;DR: it now makes half as many connections to the server to establish a WebSocket connection. Read on for more detail.

One responsibility of the Bayeux protocol on which Faye is based is figuring out which transport type to use between the client and the server. It does this using an upgrading process as follows. To initiate a connection, the client uses a vanilla HTTP request to send a message on the /meta/handshake channel; in the browser this is done using XMLHttpRequest for same-origin requests and JSON-P otherwise. The server’s response to this message includes two important things: a randomly generated client ID, and a list of transport types supported by the server.

$ curl -X POST http://localhost/bayeux /
    -H 'Content-Type: application/json' \
    -d '{"channel": "/meta/handshake", "supportedConnectionTypes": ["long-polling", "websocket"], "version": "1.0"}'

{
  "channel":    "/meta/handshake",
  "successful": true,
  "version":    "1.0",
  "clientId":   "irta1b0nh93z90baok4b1gbcw3p1emmcj50",
  "supportedConnectionTypes":[
    "long-polling",
    "cross-origin-long-polling",
    "callback-polling",
    "websocket",
    "eventsource",
    "in-process"
  ]
}

Once it knows what the server supports, the client can pick a new transport. But, it can’t just take this list at face value: even if the server supports WebSockets and the client has a WebSocket object available, the intervening network, proxies and so on may break the connection. So, the client needs to begin trying connections to find out which transports actually work before upgrading from the vanilla HTTP transport it is using. This testing is asynchronous, so in the meantime the client continues to use the vanilla transport.

In practise this results in the following sequence of events:

  1. Select a vanilla transport, either long-polling (XHR) or callback-polling (JSON-P)
  2. Send the /meta/handshake message to the server
  3. Receive a response, store client ID and list of supported transports
  4. Begin testing WebSocket and EventSource connections in the background
  5. Begin sending publish and subscribe messages using the original transport
  6. Eventually step 4 completes and the transport is upgraded

This typically results in four connections to the server during set-up, in the best case where WebSocket works:

  1. First POST request sending handshake message
  2. Trial WebSocket connection during transport selection
  3. Second POST request to send subscriptions and begin polling
  4. Second WebSocket connection used to actually send messages

Faye 0.8.4 improves this in two ways. First, it begins testing all the transports it has available before it knows what the server supports. This means a WebSocket connection is opened even before the handshake message is sent over HTTP. Second, it caches the connection it uses to test the transport, and reuses it for sending messages. So, in the best case, by the time the server’s handshake response arrives we already have a WebSocket open and can begin using it, so we establish a connection with only two connections.

You may ask, since we’re proactively testing whether WebSocket works, why can’t we rely on that and send the initial handshake over a socket connection, taking the connection count down to one? Well, in the case where WebSockets don’t work, because either the server or an intervening proxy doesn’t support it, it can take a long time to find out that things are broken. In some cases, you don’t get an error event from the WebSocket client until the TCP connection times out, and this can take a long time. This is what the CometD client does, and it renders the client unusable for the first 60 seconds because it tries to send the handshake over an unresponsive socket. So while in the best case you use one less connection, if there’s any problem the client degrades horribly. (It also doesn’t bother testing the connection first, it just assumes that the availability of WebSocket means everything’s fine, which futher compounds its responsiveness problem.)

By using an upgrade strategy and testing transports in the background, it means the client always has a transport it can use to send and receive messages, with no interruption in service. The improved responsiveness when there are problems is easily worth that one extra request in the best case.

One final thing to mention about this release is that I’ve finally written up a guide to securing Faye and other socket-based applications, which includes authenticating both publish and subscribe access, and preventing CSRF and XSS attacks. I’ve decided that educating people about this is better than providing canned extensions for this, since different applications do require different things. If you have experience with socket security and want to contribute, just send me a pull request.