UTF-8: it’s what strings are made of

Earlier today, a change was announced to Node.js that affects how it handles invalid UTF-8 in strings that are converted to buffers. Once again, I find myself checking over the UTF-8 validation code in websocket-driver, and once again I find I cannot ever remember how to make sense of this regex that performs the validation. I just copied it off a webpage once and it took a while (and reimplementing UTF-8 myself) to fully understand what it does. If you write software that processes text, you’ll probably need to understand this too, so I figured I’d write it down.

The first thing you need to understand is that Unicode and UTF-8 are not the same thing. Unicode is a standard that aims to assign a fixed number to every character and glyph in all the world’s writing systems. For example, number 65, or U+0041 is the uppercase letter ‘A’, 90 or U+005A is the uppercase letter ‘Z’, and 32 or U+0020 is a space. U+02A4 is the character ‘ʤ’, U+046C is ‘Ѭ’, U+0BF5 is ‘௵’, and so on. In all, these numbers or ‘code points’ range up to U+10FFFF or 1,114,111.

A Unicode string, a sequence of characters, is really a sequence of these numbers, from 0 to 1,114,111. How those numbers translate to the glyphs you see on the screen depend on which font is used to render the text. But when we send text over a TCP connection or save it on disk, we store it as a sequence of fixed-width bytes. An 8-bit byte can only take one of 256 values, so how are we supposed to represent 1,114,112 possible code points? That’s where encodings come in.

UTF-8 is one of many encodings of Unicode. An encoding defines a mapping between a sequence of bytes and a sequence of code points, and tells us how to convert between them. UTF-8 is an encoding commonly used on the web, and is mandated as the encoding of text messages in the WebSocket protocol.

So how does UTF-8 work? Well, the first thing to realise is we can’t just map all the code points onto bytes: most of them are too big. We can’t even do that for the code points 00 to FF, since that leaves us with no bytes to represent higher values. But we can do it for the range 00 to 7F (0 to 127) and leave the range 80 to FF to represent the other code points. The first 128 code points are represented literally using the least significant 7 bits of a single byte:

    U+0000 to U+007F:

    00000000  00  --  7F  01111111

This is what makes UTF-8 special: rather than use 3 bytes to represent each code point (1,114,111 is 21 bits), it uses a variable number of bytes, between 1 and 4. The first 128 code points are 1 byte each, and all the remaining code points are made of combinations of the remaining 128 bytes. This has two main benefits, although one applies mostly to programmers and English speakers. The first is that UTF-8 is backward-compatible with ASCII: all valid ASCII documents are valid UTF-8 documents, per the above mapping. The second, which is a consequence of this, is that we did not double or triple the volume of data required to transmit English text.

The 1-byte range gives us 7 bits to play with. To represent bigger numbers, we need more bytes, and UTF-8 defines the 2-byte range as being composed of pairs of the form 110xxxxx 10yyyyyy. The x and y bits are variable, giving us 11 bits to play with; this gets us up to U+07FF.

    U+0080 to U+07FF:

    11000010  C2  --  DF  11011111
    10000000  80  --  BF  10111111

That is, the code point U+0080 becomes the bytes C2 80 and code point U+07FF becomes DF BF. Note that it is an error to use more space than you need to represent a code point: C1 BF or 11000001 10111111 could translate to U+007F but we can represent this code point with only one byte, so C1 BF is not a legal byte sequence.

In general, multibyte code points are composed of a byte with the most significant bit set (i.e. bytes >= 80) followed by one or more bytes of the form 10xxxxxx. These trailing bytes can therefore range from 80 to BF. Bytes lower than 80 are used for 1-byte code points and it is an error to see them in a multibyte encoding. The value of the first byte tells us how many trailing bytes will follow it.

Moving on to 3-byte code points, those are of the form 1110xxxx 10yyyyyy 10zzzzzz, giving us 16 bits of data and letting us reach code point U+FFFF. However, we now run into an accident of history. Unicode was first described in the Unicode 88 paper, which states:

The idea of expanding the basis for character encoding from 8 to 16 bits is so sensible, indeed so obvious, that the mind initially recoils from it. There must be a catch to it…

Are 16 bits, providing at most 65,536 distinct codes, sufficient to encode all characters of all the world’s scripts? Since the definition of a “character” is itself part of the design of a text encoding scheme, the question is meaningless unless it is restated as: Is it possible to engineer a reasonable definition of “character” such that all the world’s scripts contain fewer than 65,536 of them?

The answer to this is Yes.

– Joseph D. Becker PhD, ‘Unicode 88′

Of course, it turns out that the answer to this is in fact No, as you may have guessed from the existence of 1,114,112 modern-day code points. During the design of UTF-16 – another encoding with a fixed two-bytes-per-codepoint scheme – it was realised that 16 bits were in fact insufficient to encode all known characters. So, the Unicode standard reserves a special range of code points that UTF-16 uses to encode values above FFFF. Such values are encoded using four bytes, that is two ‘regular’ code points, where the first two bytes are in the range D8 00 to DB FF and the second two are in the range DC 00 to DF FF. Code points in the range U+D800 to U+DFFF are called surrogates, and UTF-16 uses surrogate pairs to represent large values. No characters will ever be assigned to these code points and no encoding should attempt to represent them.

So, for our 3-byte range we can actually only encode the code points U+0800 to U+D7FF and U+E000 to U+FFFF.

    U+0800 to U+D7FF:

    11100000  E0  --  ED  11101101
    10100000  A0  --  9F  10011111
    10000000  80  --  BF  10111111

    U+E000 to U+FFFF:

    11101110  EE  --  EF  11101111
    10000000  80  --  BF  10111111
    10000000  80  --  BF  10111111

And finally, we reach the 4-byte range, whose bytes look like 11110www 10xxxxxx 10yyyyyy 10zzzzzz, giving us 21 bits and letting us reach U+10FFFF. There are no gaps in this range, but we don’t need to use all possible byte values in this range to cover the remaining code points, so we end up with:

    U+010000 to U+10FFFF:

    11110000  F0  --  F4  11110100
    10010000  90  --  8F  10001111
    10000000  80  --  BF  10111111
    10000000  80  --  BF  10111111

We’ve now covered all the valid sequences of bytes that can represent a single character in UTF-8. They are:

  • [00-7F]
  • [C2-DF] [80-BF]
  • E0 [A0-BF] [80-BF]
  • [E1-EC] [80-BF] [80-BF]
  • ED [80-9F] [80-BF]
  • [EE-EF] [80-BF] [80-BF]
  • F0 [90-BF] [80-BF] [80-BF]
  • [F1-F3] [80-BF] [80-BF] [80-BF]
  • F4 [80-8F] [80-BF] [80-BF]

These can be encoded as a regex, but remember that regexes work on characters, not bytes. In Node, we can convert a buffer (a byte array) into a string whose characters are the literal code points given by those bytes (i.e. code points 0 to 255) using buffer.toString('binary'), and then compare that string to the regex to validate it.

So, now that we understand UTF-8, we can understand what has changed in Node:

// Prior to these releases:
new Buffer('ab\ud800cd', 'utf8');
// <Buffer 61 62 ed a0 80 63 64>

// After this release:
new Buffer('ab\ud800cd', 'utf8');
// <Buffer 61 62 ef bf bd 63 64>

The character \ud800 is a surrogate, and is therefore not a valid character since no encoding should exist for it. However, JavaScript allows this string to exist without raising an error, so it was decided that converting such a string to a buffer should not raise an error either. Instead, that character is now replaced with '\ufffd', the unknown character. Rather than letting your program send a valid JS string to another peer that would reject it as invalid UTF-8, Node replaces it with a character that’s not a surrogate, preventing downstream errors. In general I’d advise against attempting to guess what the programmer meant when they provide malformed input, but since Unicode provides a code point that one should “used to replace an incoming character whose value is unknown or unrepresentable in Unicode”, this seems like a reasonable choice.

Graphical thinking: optimisation through functional programming

A little over a year ago, I wrote an essay called Callbacks are Imperative, Promises are Functional, in which I argued that promises are not a control-flow device but rather a tool to describe relationships between dependent values, such that the promise library can figure out the control flow — the correct and optimal order of operations — for you. I gave the example of a module loader, where by modelling a set of modules as a graph of lazy promises, we can write code that describes the dependencies between modules and automatically have them downloaded in the correct order and with the maximum possible parallelisation. This echoes a quote from the book Learn You a Haskell for Great Good:

In purely functional programming you don’t tell the computer what to do as such but rather you tell it what stuff is.

Learn You a Haskell for Great Good

Writing the control flow for optimally downloading modules by hand is rather hard; better to invent a system where you can describe the shape of your domain and have the system solve the problem for you.

A few months ago, I wrote a tutorial on Make, having spent a year with it while writing JavaScript Testing Recipes. Though I didn’t realise it when writing my promises essay, there is a deep similarity between Make and my promise-based module loader: in both systems we work by defining dependencies among a set of artefacts, and then the platform figures out which things must be run and it what order when you ask for one of those artefacts. Make can even parallelise the tasks it runs when it spots that two tasks are independent and can therefore be run in any order — if it doesn’t matter which order you perform two tasks in, you can probably run them in parallel.

This week I was working on some Redis code, and I realised that this concept of solving problems by relating values and deriving the control flow from those relationships had given me an insight into solving a problem I’d been having with performing atomic changes to a Redis database. I don’t use any of the above concepts directly in the code, but I found that they gave me a new way to think about the problem and suddenly realise a solution. Let me try to explain.

Shortly before writing that promises article, I released reStore, a remoteStorage server written in Node with both filesystem- and Redis-based backends. remoteStorage is a protocol for storing arbitrary documents via HTTP; each user on a server has a directory tree and they can grant applications access to parts of that tree for the application to store user data. The application uses GET to retrieve documents by pathname, and receives the document’s content, plus its media type in a Content-Type header and its version in an ETag. It can also request folder listings using GET to a pathname ending in a trailing slash, which returns a list of the folder’s children (documents and other folders) with their current versions.

The app can save new documents using PUT, setting their type using Content-Type, and remove them using DELETE. Both these methods allow use of the If-Match header; the application can pass in the current version it has for a document, and the server will process the request only if that version is indeed current. Otherwise, the app makes a GET to retrieve the latest version, reapplies its changes and tries to save again; this prevents applications from clobbering one another’s changes.

Now, these semantics lend themselves naturally to using the filesystem as a backend, and indeed reStore has such a backend. But it also has a Redis backend, and I’ve recently been doing some work to make sure that when it saves and deletes documents, all changes to the Redis database are done atomically.

Most of the complexity in doing this comes from the fact that remoteStorage is not quite as low-level as the Unix filesystem. When you save a document, all its parent folders are created implicitly, and if you delete the only remaining item from a folder then that folder is deleted. So folders exist only if they contain documents, or other folders. Clients never manipulate folders directly, they only deal with saving documents.

Furthermore, all folders have an ETag version just like documents do. A folder’s version must change if any document within its tree is updated or removed. This is to facilitate syncing: a client can easily tell if its copy of a tree is up to date by comparing the version tag on the root folder, rather than checking every document within the tree.

So when a document is saved, we need to make sure that all its containing folders exist and update all of their versions. (We could store only the documents and figure out the folders when a GET is made, but it’s cheaper to generate the tree on write than on read.) For ease of implementation, the version of a document in reStore is simply the Unix timestamp of the last time it was changed (the remoteStorage spec previously required versions to be timestamps and it used Last-Modified rather than ETag). The version of a folder is the latest timestamp that exists on any of its children.

Here’s a naive version of a backend store for the requirements we’ve layed out. Although reStore is written in Node, I’ve written these examples in Ruby since it’s far easier to see what’s actually happening without all those callbacks getting in the way. It has one public method: put(), that handles requests to save documents. It takes a username, document path, content type, and optional version string. It begins by splitting the pathname into its segments, for example parse_path('/path/to/file.txt') returns ['/', 'path/', 'to/', 'file.txt']. It checks whether the given version is current or not — if no version is given, the request is always allowed, but if a version is given for a document that doesn’t exist the request is forbidden since the client has stale data — then proceeds to update the tree. It uses SADD to make sure each item is listed as a child of its parent folder, and HSET to set the modified time on each parent folder to the current time. Finally, it checks if the target file exists (we need to tell the client if the document was created or updated), then saves its content, media type, size and modified time using HMSET.

class Store
  VersionConflict = Class.new(StandardError)

  def initialize(redis_client)
    @redis = redis_client
  end

  def put(username, pathname, content, type, version = nil)
    query    = parse_path(pathname)
    filename = query.pop
    prefix   = "users:#{username}:data:"
    modified = (Time.now.utc.to_f * 1000).floor

    is_current = get_current_state(prefix + pathname, version)
    raise VersionConflict unless is_current

    query.each_with_index do |dir, i|
      key = prefix + query.slice(0, i + 1).join('')
      @redis.hset(key, 'modified', modified.to_s)
      @redis.sadd(key + ':children', query[i + 1] || filename)
    end

    exists = @redis.exists(prefix + pathname)

    @redis.hmset(prefix + pathname, :length,   content.size,
                                    :type,     type,
                                    :modified, modified,
                                    :content,  content)
    [!exists, modified]
  end

private

  def parse_path(pathname)
    pathname.scan(/[^\/]*(?:\/|$)/)[0..-2]
  end
end

Here’s what this code does when we run the call:

store.put('jcoglan', '/books/jstr/preface.txt', 'Preface to JSTR', 'text/plain')

The Redis MONITOR command prints the following:

"hget" "users:jcoglan:data:/books/jstr/preface.txt" "modified"
"hset" "users:jcoglan:data:/" "modified" "1402430750408"
"sadd" "users:jcoglan:data:/:children" "books/"
"hset" "users:jcoglan:data:/books/" "modified" "1402430750408"
"sadd" "users:jcoglan:data:/books/:children" "jstr/"
"hset" "users:jcoglan:data:/books/jstr/" "modified" "1402430750408"
"sadd" "users:jcoglan:data:/books/jstr/:children" "preface.txt"
"exists" "users:jcoglan:data:/books/jstr/preface.txt"
"hmset" "users:jcoglan:data:/books/jstr/preface.txt" "length" "15" \
        "type" "text/plain" "modified" "1402430750408" "content" "Preface to JSTR"

So we can see it checking the version of the target document, updating all the parent folders and finally the document itself. However, there are two glaring holes in this implementation:

  • While we’re running these commands, another client might be doing work that becomes interleaved with ours. We might both check the current version, find that we’re up to date, and then both go ahead and update the document, clobbering one another’s changes. This is precisely the ‘interceding update’ problem that the version check is supposed to fix.
  • If our server process crashes part-way through this set of updates, the tree is left in an invalid state. Folders are said to have children that do not actually exist, or have modification times that don’t reflect the true state of things.

To make the system robust, we need to solve both of these problems, and this will partly involve making better use of the tools Redis gives us, and also re-framing the problem slightly to make it fit better into Redis’s constraints. The former technique is certainly useful but it’s the latter that I am particularly interested in here.

The first problem is actually quite easy to solve: since we’re making changes to the user’s entire directory tree, conditional on the state of one document, we will lock the user’s data while this is happening. To have each update have a consistent view of the world, they must be applied in sequence, and the lock will make sure only one update executes at a time.

The documentation for SETNX provides a locking algorithm for us. You should read those docs to understand the algorithm; here’s an implementation in Ruby:

  LOCK_TIMEOUT = 10_000

  def lock(name)
    result = nil

    while result.nil?
      lock_key     = "locks:#{name}"
      current_time = (Time.now.utc.to_f * 1000).floor
      expiry       = current_time + LOCK_TIMEOUT

      if @redis.setnx(lock_key, expiry)
        next result = yield.tap { @redis.del(lock_key) }
      end

      timeout = @redis.get(lock_key)

      if timeout.nil? or current_time < timeout.to_i
        next sleep 0.1
      end

      old_value = @redis.getset(lock_key, expiry)

      if old_value == timeout.to_s
        result = yield.tap { @redis.del(lock_key) }
      else
        sleep 0.1
      end
    end

    result
  end

Wrapping this mechanism around our PUT logic lets us make sure that nothing about the user’s tree will change between us checking the version and making our changes.

  def put(username, pathname, content, type, version = nil)
    query    = parse_path(pathname)
    filename = query.pop
    prefix   = "users:#{username}:data:"

    lock username do
      modified = (Time.now.utc.to_f * 1000).floor

      is_current = get_current_state(prefix + pathname, version)
      raise VersionConflict unless is_current

      query.each_with_index do |dir, i|
        key = prefix + query.slice(0, i + 1).join('')
        @redis.hset(key, 'modified', modified.to_s)
        @redis.sadd(key + ':children', query[i + 1] || filename)
      end

      exists = @redis.exists(prefix + pathname)

      @redis.hmset(prefix + pathname, :length,   content.size,
                                      :type,     type,
                                      :modified, modified,
                                      :content,  content)
      [!exists, modified]
    end
  end

The second problem is more subtle. We want to make all these changes to the database — a combination of HSET, SADD and HMSET commands — happen as one, so Redis will execute them as a unit. Fortunately, Redis has transactions that let us do exactly this. But, there’s one minor problem: a Redis transaction works by sending the MULTI command, then all the commands you want to run together, then sending EXEC. Only then will all the commands be executed; you cannot have a command that depends on the result of another command, because you don’t get any responses until after all the commands have run.

In our code, there’s a pesky little EXISTS sitting between the writes to the parent folders and the document itself. This must happen before the final HMSET since we need to know if the document existed before that write, and there’s a temptation to put it as close to the HMSET as possible to make sure no other changes happen between the two.

However: notice that none of the writes depend on the result of the EXISTS call. They do depend on the result of the HGET that checks the version, since they are conditional on that, but everything after that depends only on information from the original method inputs. Because of this lack of dependencies, the writes can actually be done in any order and we will get the same result. As long as it happens at some point before the final HMSET, and within the lock so we know no other changes have happened, the EXISTS call can happen as early in the process as we like. We can push it before the parent folder updates:

  def put(username, pathname, content, type, version = nil)
    query    = parse_path(pathname)
    filename = query.pop
    prefix   = "users:#{username}:data:"

    lock username do
      modified = (Time.now.utc.to_f * 1000).floor

      is_current = get_current_state(prefix + pathname, version)
      raise VersionConflict unless is_current

      exists = @redis.exists(prefix + pathname)

      query.each_with_index do |dir, i|
        key = prefix + query.slice(0, i + 1).join('')
        @redis.hset(key, 'modified', modified.to_s)
        @redis.sadd(key + ':children', query[i + 1] || filename)
      end

      @redis.hmset(prefix + pathname, :length,   content.size,
                                      :type,     type,
                                      :modified, modified,
                                      :content,  content)
      [!exists, modified]
    end
  end

Or, we can get rid of it entirely: the initial version check can tell us whether the document exists, since it will have no modified key if it’s absent. Let’s modify get_current_state() to return two values: a boolean for whether the version is current, and one for whether the document exists:

  def get_current_state(key, version)
    modified = @redis.hget(key, 'modified')
    return [version.nil?, false] if modified.nil?
    return [true, true] if version.nil?
    return [version.to_i == modified.to_i, true]
  end

and use this new value in our put() method:

  def put(username, pathname, content, type, version = nil)
    query    = parse_path(pathname)
    filename = query.pop
    prefix   = "users:#{username}:data:"

    lock username do
      modified = (Time.now.utc.to_f * 1000).floor

      is_current, exists = get_current_state(prefix + pathname, version)
      raise VersionConflict unless is_current

      query.each_with_index do |dir, i|
        key = prefix + query.slice(0, i + 1).join('')
        @redis.hset(key, 'modified', modified.to_s)
        @redis.sadd(key + ':children', query[i + 1] || filename)
      end

      @redis.hmset(prefix + pathname, :length,   content.bytes.size,
                                      :type,     type,
                                      :modified, modified,
                                      :content,  content)
      [!exists, modified]
    end
  end

Now if we run our test again, we see this pattern of commands:

"setnx" "locks:jcoglan" "1402432911685"
"hget" "users:jcoglan:data:/books/jstr/preface.txt" "modified"
"hset" "users:jcoglan:data:/" "modified" "1402432901692"
"sadd" "users:jcoglan:data:/:children" "books/"
"hset" "users:jcoglan:data:/books/" "modified" "1402432901692"
"sadd" "users:jcoglan:data:/books/:children" "jstr/"
"hset" "users:jcoglan:data:/books/jstr/" "modified" "1402432901692"
"sadd" "users:jcoglan:data:/books/jstr/:children" "preface.txt"
"hmset" "users:jcoglan:data:/books/jstr/preface.txt" "length" "15" \
        "type" "text/plain" "modified" "1402432901692" "content" "Preface to JSTR"
"del" "locks:jcoglan"

Two important things have happened here: first, all the commands have happened inside a locking block bounded by the SETNX and DEL commands. Second, and more importantly for making the writes atomic: notice that this operation is now a read (HGET) followed by a sequence of writes (HSET, SADD, HMSET). The fact that all the writes are uninterrupted by any reads, and do not depend on one another, means we can wrap them in a MULTI transaction:

  def put(username, pathname, content, type, version = nil)
    query    = parse_path(pathname)
    filename = query.pop
    prefix   = "users:#{username}:data:"

    lock username do
      modified = (Time.now.utc.to_f * 1000).floor

      is_current, exists = get_current_state(prefix + pathname, version)
      raise VersionConflict unless is_current

      @redis.multi do
        query.each_with_index do |dir, i|
          key = prefix + query.slice(0, i + 1).join('')
          @redis.hset(key, 'modified', modified.to_s)
          @redis.sadd(key + ':children', query[i + 1] || filename)
        end

        @redis.hmset(prefix + pathname, :length,   content.bytes.size,
                                        :type,     type,
                                        :modified, modified,
                                        :content,  content)
      end

      [!exists, modified]
    end
  end

This leaves us with out final command sequence:

"setnx" "locks:jcoglan" "1402433277043"
"hget" "users:jcoglan:data:/books/jstr/preface.txt" "modified"
"multi"
"hset" "users:jcoglan:data:/" "modified" "1402433267046"
"sadd" "users:jcoglan:data:/:children" "books/"
"hset" "users:jcoglan:data:/books/" "modified" "1402433267046"
"sadd" "users:jcoglan:data:/books/:children" "jstr/"
"hset" "users:jcoglan:data:/books/jstr/" "modified" "1402433267046"
"sadd" "users:jcoglan:data:/books/jstr/:children" "preface.txt"
"hmset" "users:jcoglan:data:/books/jstr/preface.txt" "length" "15" \
        "type" "text/plain" "modified" "1402433267046" "content" "Preface to JSTR"
"exec"
"del" "locks:jcoglan"

We have made all the changes made by the put() method into a single transaction. Redis will not run any of them until it receives EXEC, preventing any of them from running if the process crashes while sending them. Either they all run, or none does.

This was kind of a silly example though: technically, we could have left that EXISTS call in the middle of the transaction. None of the writes depended on its result, so we could have left it in before the final HMSET and received its result at the end of the process. It turned out the EXISTS call was redundant anyway, but this example demonstrates a key idea: analysing the dependencies between operations, flattening that graph so that things happen as early as possible, disinterleaving reads and writes, and noticing when the order of commands does and does not matter, is a key strategy to atomising Redis operations. We got lucky with PUT, but DELETE is far more tricky.

DELETE looks a little like PUT run in reverse. We delete the target document, and remove it as a child from its parent folder. Then we go back up the tree, removing and updating things. If a folder has no children, we delete it. If it does have children, we set its version to the latest version of any of its remaining children, then move up to the next level. Here’s a first stab at an implementation, where we’ve added another return value to get_current_state(): we need to tell the client whether the deleted document existed and what its version was:

  def delete(username, pathname, version = nil)
    prefix   = "users:#{username}:data:"
    query    = parse_path(pathname)
    filename = query.pop
    parents  = (0...query.size).map { |i| query[0..i].join('') }.reverse

    lock username do
      is_current, exists, modified = get_current_state(prefix + pathname, version)
      raise VersionConflict unless is_current

      @redis.del(prefix + pathname)
      @redis.srem(prefix + parents.first + ':children', filename)

      parents.each_with_index do |parent, i|
        children = @redis.smembers(prefix + parent + ':children')
        if children.empty?
          @redis.del(prefix + parent)
          if i + 1 < parents.size
            item = query[query.size - i - 1]
            @redis.srem(prefix + parents[i + 1] + ':children', item)
          end
        else
          mtimes = children.map do |child|
            @redis.hget(prefix + parent + child, 'modified')
          end
          @redis.hset(prefix + parent, 'modified', mtimes.map(&:to_i).max)
        end
      end

      [exists, modified]
    end
  end

private

  def get_current_state(key, version)
    modified = @redis.hget(key, 'modified')
    return [version.nil?, false, nil] if modified.nil?
    return [true, true, modified.to_i] if version.nil?
    return [version.to_i == modified.to_i, true, modified.to_i]
  end

As an example, say we run the following commands to add three documents and then delete them in reverse order:

store.put('jcoglan', '/books/jstr/preface.txt',          'Preface to JSTR',         'text/plain')
store.put('jcoglan', '/books/jstr/chapters/browser.txt', 'Browser Applications',    'text/plain')
store.put('jcoglan', '/books/jstr/chapters/cli.txt',     'Command-line Interfaces', 'text/plain')

store.delete('jcoglan', '/books/jstr/chapters/cli.txt')
store.delete('jcoglan', '/books/jstr/chapters/browser.txt')
store.delete('jcoglan', '/books/jstr/preface.txt')

Here’s the command sequence for the second delete(), which will implicitly delete its parent folder but leave the others intact. We see it deleting the document, then its parent, then updating the versions of the other folders above it, by checking the children of each folder in turn.

"setnx" "locks:jcoglan" "1402435177397"
"hget" "users:jcoglan:data:/books/jstr/chapters/browser.txt" "modified"
"del" "users:jcoglan:data:/books/jstr/chapters/browser.txt"
"srem" "users:jcoglan:data:/books/jstr/chapters/:children" "browser.txt"
"smembers" "users:jcoglan:data:/books/jstr/chapters/:children"
"del" "users:jcoglan:data:/books/jstr/chapters/"
"srem" "users:jcoglan:data:/books/jstr/:children" "chapters/"
"smembers" "users:jcoglan:data:/books/jstr/:children"
"hget" "users:jcoglan:data:/books/jstr/preface.txt" "modified"
"hset" "users:jcoglan:data:/books/jstr/" "modified" "1402435167365"
"smembers" "users:jcoglan:data:/books/:children"
"hget" "users:jcoglan:data:/books/jstr/" "modified"
"hset" "users:jcoglan:data:/books/" "modified" "1402435167365"
"smembers" "users:jcoglan:data:/:children"
"hget" "users:jcoglan:data:/books/" "modified"
"hset" "users:jcoglan:data:/" "modified" "1402435167365"
"del" "locks:jcoglan"

This problem seems far harder to fix: in order to delete each folder or update its version, we need to check its children, and in order to make sure the children are up to date, we need to update the next folder down the chain, all the way back to the target document. This is, of course, how we would implement it on a real filesystem, where one cannot delete non-empty directories.

But there is a way out: think back to the PUT example: we made the writes atomic by figuring out as many of their parameters as possible in advance, allowing us to disinterleave the writes from anything they depended on, making it possible to run them in a single transaction.If you think about it for a minute, you’ll realise you can predict which folders will need to be deleted and updated before you start deleting anything at all, and we’ll use that trick to tease apart the reads and writes.

We can say, in advance, that if we delete a document then any folders that form a chain from its immediate parent that have only one child will need to be deleted: those folders exist entirely to contain the document we’re going to delete. So we can figure out the deletable folders in advance by walking up from the document until we find a folder with more than one child.

Similarly, we can figure out the versions of all the remaining folders after the deletion by reading all the relevant item versions in advance, that is, getting the versions of all the children of all the parents of the deleted document. Note this doesn’t involve reading the whole tree; only the immediate children of each folder need to be considered. From these lists, we can remove the version for the topmost deletable folder, then back-propagate to calculate the resulting versions of the other folders.

The code for this is significantly more verbose but I’ve tried to comment what’s going on at each point. You should be able to see how the previous approach of deleting an item, then checking how its parents need to be updated, then recursing up the tree, has been replaced with an algorithm that plans all the changes we need to make up front by doing that same process on data that we’ve already gathered into memory ahead of time.

  def delete(username, pathname, version = nil)
    prefix  = "users:#{username}:data:"
    query   = parse_path(pathname)
    parents = (0...(query.size - 1)).map { |i| query[0..i].join('') }.reverse

    # e.g. if query = ['/', 'books/', 'jstr/', 'preface.txt']
    #  then parents = ['/books/jstr/', '/books/', '/']

    lock username do
      is_current, exists, modified = get_current_state(prefix + pathname, version)
      raise VersionConflict unless is_current

      # Find all the immediate child items of the list of parent folders; this
      # returns an Array<Array<String>>.

      children = parents.map do |parent|
        @redis.smembers(prefix + parent + ':children')
      end

      # Determine which folders will be empty by starting with the immediate
      # parent and finding folders with only one child.

      empty, index = [], 0

      while index < parents.size and children[index].size == 1
        empty << parents[index]
        index += 1
      end

      # The remaining folders are the non-empty ones, i.e. those after the index
      # we just counted up to. Their children are the corresponding lists in the
      # children array.

      remaining = parents[index..-1]
      children  = children[index..-1]

      # Construct an Array<Array<Fixnum>> of modified times corresponding to the
      # children array.

      mtimes = children.map.with_index do |child_list, i|
        child_list.map do |child|
          @redis.hget(prefix + remaining[i] + child, 'modified').to_i
        end
      end

      # An index where we'll store the computed versions for remaining folders.

      ancestors = {}

      if remaining.size > 0
        # Remove the uppermost empty folder as a child from its parent, and
        # delete its corresponding version from the mtimes list.

        item = query[query.size - empty.size - 1]
        @redis.srem(prefix + remaining.first + ':children', item)
        mtimes.first.delete_at(children.first.index(item))

        # Back-propagate the version updates to the other remaining parent
        # folders. We calculate the new version from each folder's remaining
        # children, update this version in the mtimes list then repeat at the
        # next level up.

        remaining.each_with_index do |dir, i|
          ancestors[dir] = mtimes[i].max

          if i + 1 < remaining.size
            item = query[query.size - empty.size - 2 - i]
            mtimes[i + 1][children[i + 1].index(item)] = ancestors[dir]
          end
        end

        # Save the updated versions to the database.

        ancestors.each do |dir, mtime|
          @redis.hset(prefix + dir, 'modified', mtime)
        end
      end

      # Delete all the folders we determined were empty, and their sets of
      # children.

      empty.each do |dir|
        @redis.del(prefix + dir)
        @redis.del(prefix + dir + ':children')
      end

      # And finally, delete the document itself.

      @redis.del(prefix + pathname)

      [exists, modified]
    end
  end

This code produces the following command sequence for deleting the second item in our test:

"setnx" "locks:jcoglan" "1402437739275"
"hget" "users:jcoglan:data:/books/jstr/chapters/browser.txt" "modified"
"smembers" "users:jcoglan:data:/books/jstr/chapters/:children"
"smembers" "users:jcoglan:data:/books/jstr/:children"
"smembers" "users:jcoglan:data:/books/:children"
"smembers" "users:jcoglan:data:/:children"
"hget" "users:jcoglan:data:/books/jstr/preface.txt" "modified"
"hget" "users:jcoglan:data:/books/jstr/chapters/" "modified"
"hget" "users:jcoglan:data:/books/jstr/" "modified"
"hget" "users:jcoglan:data:/books/" "modified"
"srem" "users:jcoglan:data:/books/jstr/:children" "chapters/"
"hset" "users:jcoglan:data:/books/jstr/" "modified" "1402437729226"
"hset" "users:jcoglan:data:/books/" "modified" "1402437729226"
"hset" "users:jcoglan:data:/" "modified" "1402437729226"
"del" "users:jcoglan:data:/books/jstr/chapters/"
"del" "users:jcoglan:data:/books/jstr/chapters/:children"
"del" "users:jcoglan:data:/books/jstr/chapters/browser.txt"
"del" "locks:jcoglan"

Before, we had calls to SMEMBERS and HGET interleaved with SREM, HSET and DEL. Now however, he have completely disinterleaved the reads and writes; all the writes have been planned ahead of time and aren’t interleaved with the reads that determine what needs to be changed. This means we can wrap all the writes in a MULTI; I won’t reproduce the Ruby code for this (we wrap part of the delete() method in @redis.multi do ... end) but the commands look like this:

"setnx" "locks:jcoglan" "1402437849398"
"hget" "users:jcoglan:data:/books/jstr/chapters/browser.txt" "modified"
"smembers" "users:jcoglan:data:/books/jstr/chapters/:children"
"smembers" "users:jcoglan:data:/books/jstr/:children"
"smembers" "users:jcoglan:data:/books/:children"
"smembers" "users:jcoglan:data:/:children"
"hget" "users:jcoglan:data:/books/jstr/preface.txt" "modified"
"hget" "users:jcoglan:data:/books/jstr/chapters/" "modified"
"hget" "users:jcoglan:data:/books/jstr/" "modified"
"hget" "users:jcoglan:data:/books/" "modified"
"multi"
"srem" "users:jcoglan:data:/books/jstr/:children" "chapters/"
"hset" "users:jcoglan:data:/books/jstr/" "modified" "1402437839350"
"hset" "users:jcoglan:data:/books/" "modified" "1402437839350"
"hset" "users:jcoglan:data:/" "modified" "1402437839350"
"del" "users:jcoglan:data:/books/jstr/chapters/"
"del" "users:jcoglan:data:/books/jstr/chapters/:children"
"del" "users:jcoglan:data:/books/jstr/chapters/browser.txt"
"exec"
"del" "locks:jcoglan"

This is now a robust deletion procedure. The SETNX lock around the whole process means that while we’re making our reads to figure out what to do, we know the structure won’t change so we’ll get a consistent view of the data. And putting all the writes together in one transaction means that we cannot leave the database in an invalid state by accident. After every atomic change that Redis makes to the data, it will be in a meaningful rather than an intermediate state that makes sense to the application.

What we’ve done here has a deep similarity with optimising module loading or Make task execution: they all involve flattening a dependency graph. If you look at the first command trace for the delete() method, where reads and writes are interleaved, almost every command depends on the one before it. The HSET that updates a folder’s modified time depends on the HGETs that read its children’s modified times, which in turn depend on the SMEMBERS to find the children’s names, all of which depends on the list of children and their times being left up-to-date by the previous iteration. The only parallelisable commands in the whole list are the DEL/SREM pairs that delete items; the dependency graph of this trace is thirteen steps deep, including the initial version check.

          HGET current document version
                       |
     DEL document / SREM from parent's children
                       |
            SMEMBERS parent's children
                       |
      DEL folder / SREM from parent's children
                       |
            SMEMBERS parent's children
                       |
             HGET all child versions
                       |
               HSET parent version
                       |
            SMEMBERS parent's children
                       |
             HGET all child versions
                       |
               HSET parent version
                       |
            SMEMBERS parent's children
                       |
             HGET all child versions
                       |
               HSET parent version

By contrast, the final trace we ended up with is very shallow: all the writes are mutually independent, which is why we can put them in a transaction. All the HGETS are mutually independent, but they each depend on one of the SMEMBERS calls, though those calls are also mutually independent. Including the initial version check, this dependency graph is then only four steps deep compared to the thirteen of the previous trace.

          HGET current document version
                       |
   SMEMBERS the children of all parent folders
                       |
        HGET remaining children's versions
                       |
       DEL / SREM / HSET items as necessary

We’ve reshaped the graph so that more of its items are order-independent. For some problems, this lets us run tasks in parallel. For this task, it means operations can be grouped into transactions and made atomic.

Now you’re probably wondering what all this has to do with promises, or Make, or functional programming. Well, the interesting thing for me is what triggered me to do this work in the first place. I was actually working on the Redis backend for Faye, where each client has a set of channels it’s subscribed to (and each channel has a reverse index of clients subscribed to it, for routing messages), and a list of messages that are queued for delivery to that client. The engine periodically sweeps the database for expired sessions and removes all their subscriptions and message queues.

For a long time after I launched the Redis engine, there were problems with race conditions that would let expired sessions leak memory over time. For example, if we delete a client’s message queue before we delete its subscriptions, then there’s a window of time where a published message can still be routed to that client and added to its queue, after that queue has been deleted.

I found this problem really hard to solve for a long time — the Redis backend launched over 3 years ago — and it surprised me that all of the sudden the solution became obvious to me. Just like our PUT example above, the session expiry mechanism had a read right in the middle of it that looked up the client’s subscriptions so it could delete them all, actually by delegating to another function that just handled unsubscribing a client from a single channel. Thinking about the problem as a dependency graph made me realise that the read could be pushed to the start of the process, leaving a bunch of mutually independent writes that could all be put in a transaction. Et voilá, atomic client expiry, one less race condition to worry about.

I had been stuck because I was thinking about the problem from an imperative programming mindset. In imperative programming, or more precisely, in systems with strict rather than lazy evaluation, we are forced to tell the computer to perform actions in a certain order. Sometimes, the order we decide upon is important — the program would break if its statements were reordered — and sometimes it is not — a set of statements can be evaluated in any order and the program would still work. That is, in an imperative program we often see code like this:

f()
g()

But we are not sure whether it matters whether f() is called first or not. This ambiguity arises because the program does not express the dependency relationships between the entities it manipulates. Those relationships are in the programmer’s head or on a whiteboard somewhere, and they have done a bunch of computation outside of the program to determine what the program itself should look like. The program expresses the solution the programmer invented, rather than solving the problem itself, and crucial constraints on the design of the solution are omitted.

In a functional language like Haskell, where functions are lazy and have no side effects, the only way to force two expressions to run in a certain order is to make one depend on the other, either by taking its output as input:

g(f())

Or by making one expression conditional on the other:

if f() {
  g()
}

(These examples are generic pseudocode, not executable Haskell.)

In the above construction that simply calls f() and then g(), it’s not clear from the code itself whether f() can be called later or removed from the program entirely. The only reason it could be necessary to place it before g() is that it has some side effect that affects the result of g(). The only way to tell for sure is to run the program, possibly via a test harness, and check what its cumulative side effects are, for example checking the state it leaves a database in.

The latter two constructs express, via the syntax of the program, that g() depends on f() in ways that are far stronger than simply listing them one after the other. They make it clear that the result of the program will be very different if f() is removed; if its output is fed directly into g(), or used to decide whether to call g() at all, then we know it must affect g()‘s output in some way. This dependency information is encoded in the program itself, and not enforced implicitly via a test suite or manual QA process.

Stuck in the imperative mindset, I was convinced the solution had to lie in the order I was running these Redis commands in, that if I could just find the right recipe and run commands at just the right times, the problem would be solved. But having spent the last year thinking about things like promises and Make, I suddenly saw the problem instead in terms of dependencies. Just as I have done in the examples above, I thought about the dependencies between values explicitly, then tried to see if I could rearrange that graph to bring all the writes together, and lo and behold the right answer just fell out.

The funny thing is, I’m not sure anyone would call my final code for DELETE a ‘functional’ program; it’s still very much stateful imperative code, full of side effects and mutable data structures. But I would argue that it’s more functional than the original DELETE code. In the original, the order of statements is very important, and there are few places where an explicit dependency is drawn between two operations by having values defined in terms of other values. Frequently, getting a correct value depends on some SREM or HSET side effect having been executed before its calculation, and that dependency is entirely encoded by the statements executing in a certain order rather than by relationships between values.

But in the second example, far more of the statements that are order-dependent are forced to be so by having values defined in terms of other values. The calculations for empty, remaining and mtimes depend explicitly on the value of children, thus forcing children to be calculated before them. Once we get to the write operations, those can be executed in any order and we can tell this because there is no write operation that uses the value of another write operation as its input. We can reorder the statements for the writes and the program will still work. Whereas, if we reorder the statements for the reads, we will get an ‘undefined variable’ error because a value they depend on is not available yet. We get direct feedback from the programming language that an operation makes no sense, rather than being able to freely reorder statements and having to check the database to see what went wrong. If we were writing this in a compiled language, this error could even be flagged by static analysis without running the program at all.

In other words, the program uses functional techniques to indicate and force the correct order of operations, rather than imperative ones, and makes it clear where dependencies exist and where they do not. Instead of using the database as the working space to incrementally figure out which changes to make, we’ve moved all that work into the program’s memory where we can derive values from one another and compute a single set of changes to make at the end.

Once you start thinking in terms of dependency graphs, you start seeing them everywhere. They’re a great tool for figuring out how you can optimise programs, and make them more explicit and resistant to people accidentally introducing sequencing bugs later. They encode the problem constraints directly in the program, and get you a step closer to letting the computer solve your problems for you.

Do I need DI?

This article is based on a talk of the same name, on the topic of dependency injection, which I gave at Eurucamp 2012. I thought I would write it up since it’s of some relevance to the current discussion about TDD, and I frequently find myself wanting to reiterate that talk’s argument. It is only related to current events insofar as I find them emblematic of most conversations about TDD; since the talk is nearly two years old it does not directly address the current debate but tries to illustrate a style of argument I wish were more common.

DHH’s keynote at RailsConf 2014 and his ensuing declaration that TDD is dead have got the Ruby community talking testing again. My frustration with the majority of talks and writing on testing, TDD, agile development, software architecture and object-oriented design is that a great deal of it consists of generalised arguments, with very little in the way of concrete case studies. Despite the fact that I’m a perfectly capable programmer, and the fact I’ve written automated tests for all sorts of different kinds of software, I’ve sat through any number of talks on these topics that went right over my head. The impression I’m left with is that talks on TDD often only make sense if the audience already understands and agrees with the speaker’s argument. They are often littered with appeals to authority figures and books that only land if the listener knows of the cited authority or has read the particular book. At worst, they are presented as ‘movements’, wherein all the old ways of thinking must be discarded in favour of the new one true way.

Now, grand general assertions make for entertaining and engaging talks, but they are unhelpful. Real-world software development is an exercise in design, and to design things based on general rules without reference to the context of the problem at hand is to not practice design properly. We need to learn to make decisions based on the properties of the problem we’re dealing with, not by appealing to advice handed down without any context.

This is why JavaScript Testing Recipes almost entirely concerns itself with the application of a few simple techniques to a wide array of real programs, including discussion of cases where those techniques are not applicable or are too costly due to the requirements of a particular problem. I wish more talks were like this, and it’s what I was hoping to do when I wrote this talk originally.

So, this article is not really about dependency injection per se, but about how you might decide when to use it. DI, much like TDD or any other programming support tool, is not de-facto good or bad; but has pros and cons that determine when you might decide to use it. It is not even good or bad in the context of any defined problem; it is a design tool and its usefulness varies according to the mental model of the person using it.

When we ask whether an object-oriented program is well-designed, we often use proxies for this question, like asking whether the code obeys the ‘SOLID principles’:

  • Single responsibility principle
  • Open/closed principle
  • Liskov substitution principle
  • Interface segregation principle
  • Dependency inversion principle

We might look at each class in the system, decide whether or not it obeys each of these principles, and if most of the classes in the system are SOLID then we declare the system to be well-designed. However, this approach encourages us to factor our code in a certain way without regard for what that code means, what it models, how that model is realised in code, and who it is intended to be used by. We should not treat these principles as normative rules, but as observations: programs that are easy to maintain tend to exhibit these properties more often than not, but they are not a barometer against which to accept or reject a design.

Across the fence from the people advocating for SOLID, and TDD, and various prescriptive architectural styles, are the people who want to tell you off for following these rules, assessing your designs by whether you’re practicing ‘true’ OO, or becoming an architecture astronaut, reinventing Java’s morass of AbstractBeanListenerAdapterFactory classes.

I don’t find the rhetoric of either of these camps helpful: both offer decontextualised normative advice based on statistical observations of systems they happen to have worked on or heard of. They don’t know the system you’re working on.

Your job is not to keep Alan Kay happy. Your job is to keep shipping useful product at a sustainable pace, and it’s on you to choose practices that help you do that.

So let’s talk about DI, and when to use it. To begin with, what is dependency injection? Take a look at the following code for a putative GitHub API client:

class GitHub::Client
  def get_user(name)
    u = URI.parse("https://api.github.com/users/#{name}")
    http = Net::HTTP.new(u.host, u.port)
    http.use_ssl = true
    response = http.request_get(u.path)
    if response.code == '200'
      data = JSON.parse(response.body)
      GitHub::User.new(data)
    else
      raise GitHub::NotFound
    end
  end
end

In order to get the account data for a GitHub user, this class needs to make an HTTP request to api.github.com. It does this by creating an instance of Net::HTTP, using that to make a request, checking the request is successful, parsing the JSON data that comes back, and constructing a user object with that data.

An advocate for TDD and DI might quibble with the design of this class: it’s too hard to test because there’s no way, via the class’s interface to intercept its HTTP interactions with an external server – something we don’t want to depend on during our tests. A common response to this in Ruby is that you can stub out Net::HTTP.new to return a mock object, but this is often problematic: stubbing out singleton methods does not just affect the component you’re testing, it affects any other component of the system or the test framework that relies on this functionality. In fact, DHH’s pet example of bad DI uses the Time class, which is particularly problematic to stub out since testing frameworks often need to do time computations to enforce timeouts and record how long your tests took to run. Globally stubbing out Date is so problematic in JavaScript that JSTR dedicates a six-page section to dealing with dates and times.

Dependency injection is one technique for solving this problem: instead of the class reaching out into the global namespace and calling a singleton method to fabricate an HTTP client itself, the class is given an HTTP client as an argument to its constructor. That client offers a high-level API that makes the contract between the class and its dependency clearer. Say we want the GitHub::Client class to only deal with knowing which paths to find data at, and how to map the response data to domain objects. We could push all the concerns of parsing URIs, making requests, checking response codes and parsing the body – that is, all the concerns not specifically related to GitHub – down into a high-level HTTP client, and pass that into the GitHub::Client constructor.

class GitHub::Client
  def initialize(http_client)
    @http = http_client
  end

  def get_user(name)
    data = @http.get("/users/#{name}").parsed_body
    GitHub::User.new(data)
  end
end

This makes the class easier to test: rather than globally stubbing out Net::HTTP.new and mocking its messy API, we can pass in a fake client during testing and make simple mock expectations about what should be called.

So dependency injection means passing in any collaborators an object needs as arguments to its constructor or methods. At its most basic, this means removing any calls to constructors and other singleton methods and passing in the required objects instead. But, this is often accompanied by refactoring the objects involved to make their interactions simpler and clearer.

So that’s what DI is, but what is it for? Many advocates will tell you that it exists to make testing easier, and indeed it does often accomplish that. But testability is not usually an end-user requirement for a component; being testable is often a proxy for being usable. A class is easy to test if it’s easy to set up and use, without requiring a complex environment of dependencies in order for the class to work properly. Applying dependency injection solely to achieve better tests is a bad move: design is all about dealing with constraints, and we must take all the applicable constraints into consideration when designing a class’s API. The patterns we use must emerge in response to the program’s requirements, rather than being dogmatically imposed in the name of ‘best practices’.

This is best illustrated by a concrete example. I maintain a library called Faye, a pub/sub messaging system for the web. It uses a JSON-based protocol called Bayeux between the client and the server; the client begins by sending a handshake message to the server:

{
  "channel":  "/meta/handshake",
  "version":  "1.0",
  "supportedConnectionTypes": ["long-polling", "callback-polling", "websocket"]
}

and the server replies with a message containing an ID for the client, and a list of transport types the server supports:

{
  "channel":    "/meta/handshake",
  "successful": true,
  "clientId":   "cpym7ufcmkebx4nnki5loe36f",
  "version":    "1.0",
  "supportedConnectionTypes": ["long-polling", "callback-polling", "websocket"]
}

Once it has a client ID, the client can tell the server it wants to subscribe to a channel by sending this message:

{
  "channel":      "/meta/subscribe",
  "clientId":     "cpym7ufcmkebx4nnki5loe36f",
  "subscription": "/foo"
}

and the server acknowledges the subscription:

{
  "channel":    "/meta/subscribe",
  "clientId":   "cpym7ufcmkebx4nnki5loe36f",
  "successful": true
}

After the client has subscribed to any channels it is interested in, it sends a ‘connect’ message that tells the server the it wants to poll for new messages:

{
  "channel":  "/meta/connect",
  "clientId": "cpym7ufcmkebx4nnki5loe36f"
}

When the server receives a message on any channel the client is subscribed to, it sends a response to the connect message with the new message attached. The client then sends another connect message to poll for more messages. (When using a socket-based connection, new messages can be pushed to the client immediately, without a pending connect request; the connect messages then act as a keep-alive heartbeat mechanism.)

So Faye is fundamentally a client-server system rather than a true peer-to-peer one. The architecture as we understand it so far is very simple:

                          +--------+
                          | Client |
                          +---+----+
                              |
                              V
                          +--------+
                          | Server |
                          +--------+

However there is more to it than that. Notice the supportedConnectionTypes field in the handshake messages: the client and server allow these messages to be sent and received via a number of different transport mechanisms, including XMLHttpRequest, JSON-P, EventSource and WebSocket. So we can add another layer of indirection to our architecture:

                          +--------+
                          | Client |
                          +---+----+
                              |
                              V
    +-----------+-------------+----------------+-------+
    | WebSocket | EventSource | XMLHttpRequest | JSONP |
    +-----------+-------------+----------------+-------+
                              |
                              V
                          +--------+
                          | Server |
                          +--------+

We now have a question: the Server class allows for multiple implementations of the same concept – sending messages over an HTTP connection – to co-exist, rather than having one mechanism hard-coded. Above, we allowed our GitHub::Client class to take an HTTP client as a parameter, letting us change which HTTP client we wanted to use. Surely we have another opportunity to do the same thing here, to construct servers with different kinds of connection handlers, including the possibility of using a fake connection handler to make testing easier.

xhr_server         = Server.new(XMLHttpRequestHandler.new)
jsonp_server       = Server.new(JSONPHandler.new)
eventsource_server = Server.new(EventSourceHandler.new)
websocket_server   = Server.new(WebSocketHandler.new)

If we’re doing DI by the book, this seems like the right approach: to separate the concern of dealing with the abstract protocol embodied in the JSON messages from the transport and serialization mechanism that delivers those messages. But it doesn’t meet the constraints of the problem: the server has to allow for clients using any of these transport mechanisms. The whole point of having transport negotiation is that different clients on different networks will have different capabilities, and the central server needs to accommodate them all. So, although there is a possibility for building classes that all implement an abstract connection handling API in different ways (and this is indeed how the WebSocket portion of the codebase deals with all the different WebSocket protocol versions – see the websocket-driver gem), the server must use all of these connection handlers rather than accepting one of them as a parameter.

What about the client side? The Client class uses only one of the four transport classes at a time, to mediate its communication with the server. The client depends on having a transport mechanism available, but the choice of which transport to use can be deferred until runtime. This seems like a more promising venue to apply DI. So, should we make its API look like this?

var url    = 'http://www.example.com/faye',
    client = new Client(new WebSocketTransport(url));

Again, the answer is no. Allowing Client to accept a transport implementation as a constructor parameter gives the system that creates the client (whether that’s a user directly writing Faye client code or part of an application that makes use of Faye) a free choice over which transport to use. But, the choice of transport in this situation is not a free choice; it’s determined by the capabilities of the host browser runtime, the transports that the server says it supports, and which transports we can detect as actually working over the user’s network connection to the server – even if the client and server claim to support WebSocket, an intermediate proxy can stop WebSocket from working.

It is actually part of the client’s responsibility to choose which transport to use, based on automated analysis of these constraints at runtime. The same application, running on different browsers on different networks, will require different transports to be used, even throughout the lifetime of a single client instance. If the Client constructor took a transport as an argument, then the user of the Client class would have to conduct all this detective work themselves. Hiding this work is a core problem the client is designed to solve, so it should not accept a transport object via injection. Even though it would improve testability, it would result in the wrong abstraction being presented to the user.

Finally, let’s look at the internal structure of the Server class. It’s actually composed of multiple layers, and until version 0.6 those layers looked something like this:

    +--------------+
    |  RackServer  |    --> * WebSocket, XHR, JSON-P
    +-------+------+
            |                           |   message
            V                           V   objects
    +---------------+
    | BayeuxHandler |   --> * Bayeux message protocol
    +---------------+       * Connections / subscriptions
                            * Message queues

The RackServer deals with the front-end HTTP handling logic, processing all the different transport types and extracting JSON messages from them. Those JSON messages are then handed off to the BayeuxHandler, which contains all the transport-independent protocol logic. It processes the contents of the various JSON messages I showed you above, stores client IDs and their subscriptions and message queues, and routes incoming messages to all the subscribed clients. Now, some of the concerns of this class can be factored out:

    +--------------+
    |  RackServer  |    --> * WebSocket, XHR, JSON-P
    +-------+------+
            |                           |   message
            V                           V   objects
    +---------------+
    | BayeuxHandler |   --> * Bayeux message protocol
    +---------------+     +-------------------------------+
                          | * Connections / subscriptions |
                          | * Message queues              |
                          +-------------------------------+
                                        STATE

The BayeuxHandler class actually deals with two things: implementing the protocol as described by the JSON messages, and storing the state of the system: which clients are active, which channels they are subscribed to, and which messages are queued for delivery to which clients. There are many potential ways of implementing this state storage without changing the details of the protocol, and so in version 0.6 this concern was extract into an Engine class:

    +--------------+
    |  RackServer  |    --> * WebSocket, XHR, JSON-P
    +-------+------+
            |                           |   message
            V                           V   objects
    +---------------+
    | BayeuxHandler |   --> * Bayeux message protocol
    +---------------+
            |
            V
    +--------------+
    |    Engine    |    --> * Subscriptions
    +--------------+        * Message queues

There are two engine implementations available: in-memory storage, and a Redis database using Redis’s pub/sub mechanism for IPC.

                      +--------------+
                      |  RackServer  |
                      +-------+------+
                              |
                              V
                      +---------------+
                      | BayeuxHandler |
                      +---------------+
                              |
                    +---------+---------+
                    |                   |
            +--------------+     +-------------+
            | MemoryEngine |     | RedisEngine |
            +--------------+     +-------------+

Finally: we have an honest candidate for dependency injection: whether the stack uses the in-memory or Redis engine makes absolutely no difference to the rest of the stack. It’s a contained implementation detail; given any object that implements the Engine API correctly, the BayeuxHandler and all of the components that sit above it will not be able to tell the difference. The choice of engine is a free choice that the user can make entirely on their own terms, without being bound by environmental constraints as we saw with client-side transport negotiation.

However, we don’t have multiple choices at the BayeuxHandler layer of the stack: there is only one Bayeux protocol, it’s an open standard, and there aren’t multiple competing implementations of this component. It’s just in-process computation that takes messages extracted by the RackServer, validates them, determines their meaning and delegates any state changes to the engine.

So, BayeuxHandler can be parameterized on which engine object it uses, but the Server will always construct a BayeuxHandler (as well as any transport-handling objects it needs). A highly simplified version of RackServer than only deals with HTTP POST would look like this, taking an engine as input and handing it down to the BayeuxHandler:

class RackServer
  def initialize(engine)
    @handler = BayeuxHandler.new(engine)
  end

  def call(env)
    request  = Rack::Request.new(env)
    message  = JSON.parse(request.params['message'])
    response = @handler.process(message)
    [
      200,
      {'Content-Type' => 'application/json'},
      [JSON.dump(response)]
    ]
  end
end

A user would then start up the server like so:

server = RackServer.new(RedisEngine.new)

thin = Rack::Handler.get('thin')
thin.run(server, :Port => 80)

Now, in this scenario we get good tests, that make mock expectations about one object’s contract with another without stubbing globally visible bindings, as a side effect of the code fulfilling its design requirements, and of it being easy to use at the right level of abstraction for the problem it solves. Here are a couple of tests that assert the server tells the engine to do the right things, and that the server relays information generated by the engine back to the client.

require './rack_server'
require 'rack/test'

describe RackServer do
  include Rack::Test::Methods

  let(:engine) { double('engine') }
  let(:app)    { RackServer.new(engine) }

  describe 'handshake' do
    let(:message) { {
      'channel' => '/meta/handshake',
      'version' => '1.0',
      'supportedConnectionTypes' => ['long-polling']
    } }

    it 'tells the engine to create a new client session' do
      expect(engine).to receive(:create_client).and_return 'new_client_id'
      post '/bayeux', :message => JSON.dump(message)
    end

    it 'returns the new client ID in the response' do
      engine.stub(:create_client).and_return 'new-client-id'
      post '/bayeux', :message => JSON.dump(message)
      expect(JSON.parse(last_response.body)).to include('clientId' => 'new-client-id')
    end
  end
end

Even when we do decide to use dependency injection, we face some trade-offs. By making the external interface for constructing an object more complicated, we gain some flexibility but lose some convenience. For example, a convenient way to read files looks like this:

File.read('path/to/file.txt')

while a flexible way to read files might look like this:

FileInputStream fis = new FileInputStream("path/to/file.text");
DataInputStream dis = new DataInputStream(fis);
BufferedReader br = new BufferedReader(dis);
// ...

Yes, I have picked extreme examples and I’ve probably got the Java version wrong. The important point is, you can build the former API on top of the latter, but not the other way around. You can wrap flexible building blocks in a convenience API – just look at jQuery or any UI toolkit – but it’s much harder or impossible to go the other way around. Suppose you want to use the file I/O code independently of the string encoding code? File.read(path) does not expose those building blocks so you’re going to need to find them somewhere else.

When I was at Songkick, we built the clients for our backend services much like the GitHub example above: each client is instantiated with an HTTP client adapter, letting us easily switch which HTTP library we used (yes, we had to do this in production and it took an afternoon), as well as letting us pass in a fake for testing. But for calling these clients from a controller, we wrapped them in a convenience module that constructed canonical instances for us:

module Services
  def self.github
    @github ||= GitHub::Client.new(HttpClient.new('https://api.github.com'))
  end

  # and so on for other services
end

So controller code just made a call like Services.github.get_user('jcoglan'), which is pretty much as convenient as ActiveRecord::Base.find().

To summarise, there is certainly a place for DI, or for any architectural technique, but you must let the requirements of the problem – not just testing concerns – drive the patterns you use. In the case of DI, I reach for it when:

  • There are multiple implementations of an abstract API that a caller might use
  • The code’s client has a free choice over which implementation to use, rather than environmental factors dictating this choice
  • To provide plugin APIs, as in the case of Faye’s engine system or the protocol handlers in websocket-driver

Beyond DI, I would like to see much more of our design discussion focus on just that: design, in context, with case studies, without deferring to generalisation. Design patterns must be derived from the program’s requirements, not imposed in an attempt to shoe-horn the program into a predefined shape. Ultimately, rather than focussing on testing for its own sake, we must focus on usability. Testability will soon follow.

Form an orderly queue!

About once a month, I seem to stumble across an article that states that one reason JavaScript is easy to use is that, because it is single-threaded, you don’t get the synchronisation problems you see in multi-threaded environments. You can throw away everything you know about locks, mutexes, semaphores, and so on and you don’t ever need to worry about two bits of your program running concurrently and causing consistency errors in shared data structures.

In a very pedantic technical sense, JavaScript code is ‘thread-safe’. There is only one thread, so of course you don’t need to worry about multiple threads co-operating safely. However, the same kinds of problems that threads cause also plague JavaScript programs, and they will bite you unless you know how to avoid them.

But first, let’s state what we mean by ‘thread-safe’. In JavaScript, only one block of synchronous statements is ever running at a time. If you see code like this, you can guarantee that no other part of the program will be executed in between the statements and expressions in this block:

var myData = {hello: 'world'};
assertEqual(2 + 2, 4);
myData.anotherProperty = ['a', 'b', 'c'];
delete myData.hello;
callAnotherFunction('with', {some: ['args']});

Anywhere you see a sequence of synchronous statements, those statements will run one after another, and no other JavaScript will run concurrently with them. However, as soon as you make an asynchronous call, you create a pause in execution. Think of the indent in your callback function as the current block taking a breath, during which any other part of the program can run.

    var url = 'http://localhost:4567/';
    request(url, function(error, request, body) {
      var data = JSON.parse(body);
/*  ^^
    ||
    ++---- Here be dragons! */

The var url and var data lines are separated by a callback boundary; other parts of the program will run while you’re waiting for the request to finish, and you’ve no idea how long this gap will be, especially in environments like mobile browsers where latency can be both high and extremely variable. Indeed, when I’m browsing the web on my phone, I frequently see JavaScript apps behaving weirdly because the increased latency exposes scheduling bugs and race conditions that the author did not anticipate.

Asynchrony makes it harder to guarantee which order bits of your program will run in, and leads to some of the same issues as threads whereby the unpredictable ordering of the program puts it into weird states.

Let’s consider an example. Suppose you have a web server whose only function is to allow you to read and write an in-memory object. A GET request retrieves the current version of the object, and a PUT request replaces it with a new version.

var express = require('express'),
    app     = express();

var document = {};

app.use(express.json());

app.get('/', function(request, response) {
  response.json(document);
});

app.put('/', function(request, response) {
  document = request.body;
  response.send('OK');
});

app.listen(4567);

Let’s write some functions to interact with the server, to load and save the document that’s kept there.

var request = require('request');

var url = 'http://localhost:4567/';

var load = function(callback) {
  request.get(url, {json: true}, function(error, response, body) {
    callback(error, body);
  });
};

var save = function(document, callback) {
  var body    = JSON.stringify(document),
      headers = {'Content-Type': 'application/json'};

  request.put(url, {body: body, headers: headers}, callback);
};

If we make a document, save it, then make a change to it, save it again, and so on, this will fire off PUT requests to the server, one for each save() call. If we wait a while for those requests to complete, and then load() the document from the server, we can see what the result of the save() calls was.

var document = {one: 1};
save(document);
document.two = 2;
save(document);
document.three = 3;
save(document);

setTimeout(function() { load(console.log) }, 1000);

This code looks synchronous, and on my machine this program routinely prints {one: 1, two: 2, three: 3}. This shows that the server received the PUT requests in the same order that we sent them; it has kept the very last update with all the fields we added.

However, in the real world, things are rarely this simple. Everything I/O related in JavaScript is asynchronous; those save() calls don’t block while waiting for the PUT request to finish, they just initiate the request but let the program keep chugging along. Although we initiate the requests in the correct order, they might not arrive at the server in the right order. Due to variable latency, or the whims of a load balancer on the server side, they might arrive out of order at the application server. We can simulate this by adding a random delay to the save() function, so each PUT request is deferred by up to 300ms.

var save = function(document, callback) {
  var body    = JSON.stringify(document),
      headers = {'Content-Type': 'application/json'},
      latency = 300 * Math.random();

  setTimeout(function() {
    request.put(url, {body: body, headers: headers}, callback);
  }, latency);
};

When we run the program with this change, it sometimes prints {one: 1}, sometimes {one: 1, two: 2} and sometimes {one: 1, two: 2, three: 3}. The randomised latency means it’s not predictable which request will arrive at the server last and end up being the value the server keeps at the end of the process.

So, the problem is that we have code that looks synchronous, but is actually firing off async messages to a remote server in the background. We want to save the document each time we make a change to it, but make sure those save requests are processed by the server in the right order, avoiding the race conditions that latency has introduced.

We could do this by using the save() function’s callback parameter to make sure we don’t start a new save() call until the previous one is finished (remembering to handle errors along the way):

save(document, function(error) {
  if (error) return handleError(error);
  document.two = 2;
  save(document, function() {
    if (error) return handleError(error);
    document.three = 3;
    save(document, function() {
      if (error) return handleError(error);
      load(console.log);
    });
  });
});

Alternatively, we can use the Async library to clean up our callback pyramid, but basically do the same thing as the above example:

var async = require('async');

async.series([
  function(cb) {
    save(document, cb);
  }, function(cb) {
    document.two = 2;
    save(document, cb);
  }, function(cb) {
    document.three = 3;
    save(document, cb);
  }
], function(error) {
  if (error) return handleError(error);
  load(console.log);
});

However, this approach only works if the changes and save() calls are all in the same place in the codebase; if multiple independent modules are changing the document and calling save(), you can’t use callbacks to co-ordinate things because the modules don’t talk directly to one another. This situation happens frequently in modular user interface code, where two UI components both hold a reference to a data model, which is saved to the server whenever a change is made to it.

In this situation, the model itself must enforce that all save() calls must arrive at the server in the same order as they were issued by the client, as would happen if save() were a blocking operation. A blocking save() function enforces that only one save() can happen at a time by waiting until the request finishes before it returns. An async save() function needs to do the same thing: it must allow multiple parts of the codebase to call it, but it must make sure we wait for the last request to end until the next one is initiated. Only one PUT can be in flight at a time.

Since this is JavaScript, we can’t make save() be blocking by putting a while (requestIsNotDone) loop in it; this blocks the single-threaded event loop and stops the request ever completing. It’s also inefficient to use a flag to say that if save() is currently in flight, then further calls to it should fail. This makes for messy code and drops half your save commands on the floor.

Fortunately, the problem of ‘make sure a series of async commands are executed in order’ can be elegantly solved with a queue. Instead of executing each command immediately, we push the command into a queue that executes things one at a time. Async has an API for doing exactly this; async.queue() takes a worker function and a concurrency level, where the worker function takes a value and a callback, does some work and calls the callback when it’s done.

var queue = async.queue(function(document, callback) {
  save(document, callback);
}, 1);

The concurrency level sets how many queued tasks can be processed concurrently (async operations can run concurrently even though we only have one thread); setting it to 1 means our queue will process documents one at a time. The above expression reduces to:

var queue = async.queue(save, 1);

If we push the updated documents into the queue, rather than calling save() directly, this makes sure the PUT requests are sent one at a time, and the program finishes by printing {one: 1, two: 2, three: 3} every time.

var document = {one: 1};
queue.push(document);
document.two = 2;
queue.push(document);
document.three = 3;
queue.push(document);

setTimeout(function() { load(console.log) }, 1000);

Although this example is contrived, I have used it in real programs to make sure changes to files are done in a predictable order and changes don’t clobber one another. In Vault it’s used to prevent exactly the race we saw above when talking to the filesystem or a remote server, and in reStore I use it to make sure only one operation is performed on each user’s tree at a time, so you can’t, say, try to write a document while its parent directory is being deleted by a previous request. reStore’s Redis backend uses a SETNX-based lock to achieve the same thing, and Faye uses this too to make sure only one garbage collection sweep is rounding up expired session data at a time.

So, while JavaScript might be thread-safe, it’s certainly not free from race conditions, and since most JS programs talk to remote systems asynchronously over volatile network connections, it’s actually really easy to introduce race conditions without realising it if you only test on your office wifi network. While it’s certainly nice to ignore concurrency while writing blocking code, it helps to have some queue and lock techniques in store for when you go async.

Building JavaScript projects with Make

As a long-time Ruby and JavaScript user, I’ve seen my share of build tools. Rake, Jake, Cake, Grunt, Gulp, the Rails asset pipeline… I’ve even invented one or two of my own. I’ve always wondered why every language ecosystem feels the need to invent its own build tools, and I’ve often felt like they get in my way. Too often, Rake tasks are just wrappers around existing executables like rspec or cucumber, or require custom glue code to hook a tool into the build system — witness the explosion of grunt-* packages on npm. I find Grunt particularly problematic; its configuration is verbose and indirect, and its plugins bake in assumptions that you cannot change. For example, grunt-contrib-handlebars currently depends on handlebars~1.1.2, when the current version of Handlebars is 1.3.0. It seems strange to have your build tools choose which version of your libraries you must use.

Most of the tasks we need to build our projects can be run using existing executables in the shell. CoffeeScript, Uglify, Browserify, testing tools, PhantomJS, all have perfectly good executables already and it seems silly to require extra tooling and plugins just so I can glue them together using Grunt. In the Node community we talk a lot about ‘the Unix way’, but the current crop of build tools don’t seem Unixy to me: they’re highly coupled, obfuscatory and verbose, wrapping particular versions of executables in hundreds of lines of custom integration code and configuration.

Fortunately, Unix has had a perfectly good build tool for decades, and it’s called Make. Though widely perceived as being only for C projects, Make is a general-purpose build tool that can be used on any kind of project. I’m currently using it to build my book, generating EPUB, MOBI and PDF from AsciiDoc and checking all the code in the book is successfully tested before any files are generated. (I’ve put my build process on GitHub if you’re interested.) While writing the book, I decided to use Make for the book’s example projects themselves, and found it to be a very quick way to get all the usual JavaScript build steps set up.

In this article I’m going to build a project that uses CoffeeScript source code, Handlebars templates, and is tested using jstest on PhantomJS. To start with, let’s cover the project’s source files. The contents of these files aren’t too important, the important thing is what we need to do with the files to make them ready to run. There’s a couple of .coffee files in the lib directory, just a small Backbone model and view:

# lib/concert.coffee

Concert = Backbone.Model.extend()

window.Concert = Concert
# lib/concert_view.coffee

ConcertView = Backbone.View.extend
  initialize: ->
    @render()
    @model.on "change", => @render()

  render: ->
    html = Handlebars.templates.concert(@model.attributes)
    @$el.html(html)

window.ConcertView = ConcertView

(You could use CommonJS modules instead of window globals and build the project with Browserify; hopefully having read this it’ll be obvious how to add this to your project.)

And, we have a template for displaying a little information about a concert:

<!-- templates/concert.handlebars -->

<div class="concert">
  <h2 class="artist">{{artist}}</h2>
  <h3 class="venue">{{venueName}}, {{cityName}}, {{country}}</h3>
</div>

There are also a couple of test suites in spec/*.coffee that test the template and view from above:

# spec/concert_template_spec.coffee

JS.Test.describe "templates.concert()", ->
  @before ->
    @concert =
      artist:    "Boredoms",
      venueName: "The Forum",
      cityName:  "Kentish Town",
      country:   "UK"

    @html = $(Handlebars.templates.concert(@concert))

  @it "renders the artist name", ->
    @assertEqual "Boredoms", @html.find(".artist").text()

  @it "renders the venue details", ->
    @assertEqual "The Forum, Kentish Town, UK", @html.find(".venue").text()
# spec/concert_view_spec.coffee

JS.Test.describe "ConcertView", ->
  @before ->
    @fixture = $(".fixture").html('<div class="concert"></div>')

    @concert = new Concert
      artist:    "Boredoms",
      venueName: "The Forum",
      cityName:  "Kentish Town",
      country:   "UK"

    new ConcertView(el: @fixture.find(".concert"), model: @concert)

  @it "renders the artist name", ->
    @assertEqual "Boredoms", @fixture.find(".artist").text()

  @it "updates the artist name if it changes", ->
    @concert.set "artist", "Low"
    @assertEqual "Low", @fixture.find(".artist").text()

Now, to get from these files to working code, we need to do a few things:

  • Compile all the CoffeeScript to JavaScript
  • Compile all the Handlebars templates to a JS file
  • Combine the app’s libraries and compiled source code into a single file
  • Minify the bundled app code using Uglify
  • Run the tests after making sure all the files are up to date

Make is a great tool for managing these tasks, as they are essentially relationships between files. Make does not have any baked-in assumptions about what type of project you have or what languages you’re using, it’s simply a tool for organising sets of shell commands. It has a very simple model: you describe which files each build file in your project depends on, and how to regenerate build files if they are out of date. For example, if file a.txt is generated by concatenating b.txt and c.txt, then we would write this rule in our Makefile:

a.txt: b.txt c.txt
	cat b.txt c.txt > a.txt

The first line says that a.txt (the target) is generated from b.txt and c.txt (the dependencies). a.txt will only be rebuilt if one of its dependencies was changed since a.txt was last changed; Make always tries to skip unnecessary work by checking the last-modified times of files. The second line (the recipe) says how to regenerate the target if it’s out of date, which in this case is a simple matter of piping cat into the target file. Recipe lines must begin with a tab, not spaces; I deal with this by adding the following to my .vimrc:

autocmd filetype make setlocal noexpandtab

Let’s start by installing the dependencies for this project. Add the following to package.json and run npm install:

{
  "dependencies": {
    "backbone":      "~1.1.0",
    "underscore":    "~1.5.0"
  },

  "devDependencies": {
    "coffee-script": "~1.7.0",
    "handlebars":    "~1.3.0",
    "jstest":        "~1.0.0",
    "uglify-js":     "~2.4.0"
  }
}

This installs all the build tools we’ll need, and all of them have executables that npm places in node_modules/.bin.

Let’s write a rule for building our Handlebars templates. We want to compile all the templates — that’s templates/*.handlebars — into the single file build/templates.js. Here’s a rule for this:

PATH  := node_modules/.bin:$(PATH)
SHELL := /bin/bash

build/templates.js: templates/*.handlebars
	mkdir -p $(dir $@)
	handlebars templates/*.handlebars > $@

The first line adds the executables from npm to the Unix $PATH variable so that we can refer to, say handlebars by its name without typing out its full path. (Installing programs ‘globally’ just means installing them into a directory that is usually listed in $PATH by default.) The first line of the recipe uses mkdir to make sure the directory we’re compiling the templates into already exists; $@ is a special Make variable that contains the pathname of the target we’re trying to build, and the dir function takes the directory part of that pathname.

The rule duplicates the names of the source and target files, and we often use variables to remove this duplication:

PATH  := node_modules/.bin:$(PATH)
SHELL := /bin/bash

template_source := templates/*.handlebars
template_js     := build/templates.js

$(template_js): $(templates_source)
	mkdir -p $(dir $@)
	handlebars $(templates_source) > $@

With this Makefile, running make in the shell will generate build/templates.js, or do nothing if that file is already up to date.

$ touch templates/concert.handlebars 

$ make
mkdir -p build/
handlebars templates/*.handlebars > build/templates.js

$ make
make: `build/templates.js' is up to date.

Next up, we need to compile our CoffeeScript. We want to say that every file lib/foo.coffee generates a corresponding file build/lib/foo.js, and likewise every file spec/foo_spec.coffee generates build/spec/foo_spec.js. In Make, we can use the wildcard function to find all the names of the CoffeeScript files in lib and spec, and generate lists of JavaScript files from those names using pattern substitution. In Make, the expression $(files:%.coffee=build/%.js) means for every filename in the list files, replace %.coffee with build/%.js, for example replace lib/foo.coffee with build/lib/foo.js. We also use a pattern-based rule to describe how to compile any CoffeeScript file to its JavaScript counterpart. Here are the rules:

source_files := $(wildcard lib/*.coffee)
build_files  := $(source_files:%.coffee=build/%.js)

spec_coffee  := $(wildcard spec/*.coffee)
spec_js      := $(spec_coffee:%.coffee=build/%.js)

build/%.js: %.coffee
	coffee -co $(dir $@) $<

We need to generate the names of all the generated JavaScript files because later targets will depend on them. If a target simply depended on build/*.js but we’d not built those files yet, the build wouldn’t work correctly. With this configuration, Make sets the variables to these values:

source_files := lib/concert.coffee lib/concert_view.coffee
build_files  := build/lib/concert.js build/lib/concert_view.js
spec_coffee  := spec/concert_template_spec.coffee spec/concert_view_spec.coffee
spec_js      := build/spec/concert_template_spec.js build/spec/concert_view_spec.js

So, Make now knows the names of all the generated files before they exist. The recipe for CoffeeScript states the any file build/foo/bar.js is generated from foo/bar.coffee, and uses coffee -co to make coffee compile each file into a given output directory. We use $@ as before to get the name of the current target, and $< gives the name of the first dependency of the current target. These variables are essential when dealing with pattern-based rules like this.

Pattern rules are invoked if Make has not been told explicitly how to build a particular file. If you run make build/lib/concert.js you’ll see that it generates the named file from the pattern rule.

Now that we’ve compiled all our code, we can concatenate and minify it. We want to take all the files in build_files, and template_js, and any third party libraries we need, and use Uglify to compress them. A rule for this is straightforward; note how app_bundle depends on the upstream files so that if any of them change, Make knows it needs to rebuild app_bundle.

app_bundle := build/app.js

libraries  := vendor/jquery.js \
              node_modules/handlebars/dist/handlebars.runtime.js \
              node_modules/underscore/underscore.js \
              node_modules/backbone/backbone.js

$(app_bundle): $(libraries) $(build_files) $(template_js)
	uglifyjs -cmo $@ $^

Here we’ve used another of Make’s automatic variables: $^ is a list of all the dependencies, separated with spaces. It’s handy when all your recipe does is combine all the dependencies into an aggregate file.

It’s customary to make a target called all that depends on all your project’s compiled files, and make this the first rule in the Makefile so that running make will run this rule. The all rule is also what’s called ‘phony’: all is not the name of an actual file, it’s just the name of a task we want to run, and so Make should not look for a file called all and check its last-modified time before proceeding. Targets are marked as phony by making them dependencies of the special .PHONY target.

.PHONY: all

all: $(app_bundle)

And finally, we need a task to run our tests. Let’s create a little web page for doing that:

<!-- test.html -->

<!doctype html>
<html>
  <head>
    <meta charset="utf-8">
    <title>jstest</title>
  </head>
  <body>

    <div class="fixture"></div>

    <script src="./build/app.js"></script>

    <script src="./node_modules/jstest/jstest.js"></script>
    <script src="./build/spec/concert_template_spec.js"></script>
    <script src="./build/spec/concert_view_spec.js"></script>

    <script>
      JS.Test.autorun()
    </script>

  </body>
</html>

and a PhantomJS script for launching this page and displaying the results:

// phantom.js

var JS = require("./node_modules/jstest/jstest")

var reporter = new JS.Test.Reporters.Headless({})
reporter.open("test.html")

and to top it all off, a Make task that runs the tests after making sure all the files are up to date (this task is also phony since it does not generate any files):

test: $(app_bundle) $(spec_js)
	phantomjs phantom.js

It’s also customary to add a phony task called clean that deletes any generated files from the project, putting it back in its ‘clean’ state:

clean:
	rm -rf build

So, the whole finished Makefile looks like this, containing instructions for compiling all the source code, building a single app bundle, and running the tests:

PATH  := node_modules/.bin:$(PATH)
SHELL := /bin/bash

source_files    := $(wildcard lib/*.coffee)
build_files     := $(source_files:%.coffee=build/%.js)
template_source := templates/*.handlebars
template_js     := build/templates.js
app_bundle      := build/app.js
spec_coffee     := $(wildcard spec/*.coffee)
spec_js         := $(spec_coffee:%.coffee=build/%.js)

libraries       := vendor/jquery.js \
                   node_modules/handlebars/dist/handlebars.runtime.js \
                   node_modules/underscore/underscore.js \
                   node_modules/backbone/backbone.js

.PHONY: all clean test

all: $(app_bundle)

build/%.js: %.coffee
	coffee -co $(dir $@) $<

$(template_js): $(template_source)
	mkdir -p $(dir $@)
	handlebars $(template_source) > $@

$(app_bundle): $(libraries) $(build_files) $(template_js)
	uglifyjs -cmo $@ $^

test: $(app_bundle) $(spec_js)
	phantomjs phantom.js

clean:
	rm -rf build

If you run make test, you’ll see all the compiled files are generated on the first run, but not regenerated afterward. They will only be regenerated if the source files change and you run a task that depends on them. Expressing the dependencies between files lets Make save you a lot of time waiting for things to unnecessarily recompile.

$ make test 
coffee -co build/lib/ lib/concert.coffee
coffee -co build/lib/ lib/concert_view.coffee
mkdir -p build/
handlebars templates/*.handlebars > build/templates.js
uglifyjs -cmo build/app.js vendor/jquery.js node_modules/handlebars/dist/handlebars.runtime.js \
                           node_modules/underscore/underscore.js node_modules/backbone/backbone.js \
                           build/lib/concert.js build/lib/concert_view.js build/templates.js
coffee -co build/spec/ spec/concert_template_spec.coffee
coffee -co build/spec/ spec/concert_view_spec.coffee
phantomjs phantom.js
Loaded suite: templates.concert(), ConcertView

....

Finished in 0.015 seconds
4 tests, 4 assertions, 0 failures, 0 errors

We’ve got rather a lot done with very little configuration, and no need for plugins to glue the programs we want to use into Make. We can use whatever programs we want, Make will happily execute whatever we tell it to without us needing to write any glue code between Make and the compiler tools themselves. That means fewer things to install, audit and keep up to date, and more time getting on with your project.

You can add whatever build steps you like to these recipes, so long as you follow the pattern of describing relationships between files. You can add in other testing tools like JSHint if you like, and even make them into dependencies of other tasks so that the downstream tasks won’t run unless your tests are good. That’s how I build my book: the dependencies are set up so that the book won’t build unless all the example tests pass first.

Having all these steps automated is a big time-saver, and setting them up so quickly without needing to install any tools beyond what comes with Unix means you’ve no excuse not get your project organised. It also encourages you to write functionality you need as generic scripts, rather than hiding it away inside plugins that only work with a particular build system. You save time and the whole community benefits. Not bad for a build tool from the eighties.

Announcing Faye 1.0

It’s been over four years in the making, but I’m very happy to announce the release of Faye 1.0. The Faye project – both the pub/sub messaging tools and the WebSocket/Redis modules that have been extracted over time – have reached a level of stability where I think it’s grown out of the ‘beta’ 0.x phase of development. Judging by the volume of bug reports and questions I get, the codebase has settled into something stable, usable, and free of commonly encountered faults.

The 1.0 release does not introduce many new major features, but rather builds on the stability of the 0.8 release and adds some much-needed polish in a few areas. Let’s start with the big changes you need to be aware of.

First, several methods have been removed. The most significant of these is the server’s listen() method. This was really just a little sugar over starting an HTTP/HTTPS server and attaching Faye to it, and over time the amount of different options people needed, from configuring the SSL cipher suite to picking a different Ruby webserver, indicates this one method should not try and cover all use cases. Instead, users should set up their HTTP server as they want, and bind Faye to it. The documentation covers how to do this.

On the client side, getClientId() and getState() have been removed. They were only ever there to support tests, they were never documented, and I have constantly discouraged their use. If you’re going to miss them and don’t know how to adapt your application, please ask the mailing list.

At the protocol level, the server no longer forwards the clientId field of messages to subscribers. I have discouraged use of this value as a security credential, and it was never intended as such, but to remove the risk for people who are using it this way I have made sure it does not get shared between clients.

Those are all the breaking changes. There’s a few places where existing APIs still work but have been augmented with modern equivalents.

Objects that represent publication/subscription and respond to the callback()/errback() API are now Promises, and callback/errback are internally implemented on top of then(). This is a spec-compatible promise implementation and should interoperate with any other promise library you’re using.

Objects like the server/adapter that respond to bind() for handling events now implement the EventEmitter API. bind() still works, it’s just sugar for on()/addListener().

Elsewhere there is plenty of polish and modernisation. At the transport level, custom header support has been added to the cross-origin-long-polling and server-side websocket transports. When using a client outside the browser, all transports now support cookies and share a cookie jar scoped to the client. The transports have all been refactored and are now more reliable at detecting and recovering from network errors and from client devices/applications going to sleep.

In the Client class, we’ve added a ca option to the constructor; like elsewhere in the core Node APIs this lets you pass in an array of trusted server certificates if you need to go beyond the certificates trusted by OpenSSL. Clients can also be instantiated with URI objects (from Node’s url.parse() or Ruby’s URI.parse()) instead of URL strings, and they are treated as objects in all internal URL-handling code. You can also use protocol-relative URIs in the browser, no problem.

In Ruby, some major changes mean that we now support JRuby. For example, Yajl has been replaced with MultiJson so you can choose the JSON engine that’s right for your project. And you no longer need to use Thin; Faye::RackAdapter supports the rack.hijack streaming API supported by Puma, Rainbows 4.5 and Passenger 4.0, so you have more choices for which server to use (this was a major motivator for the removal of the Thin-specific listen() method).

Beyond all these little bits of polish, we come to the one real new feature of 1.0. Until now, Faye has not let you access HTTP request data in server-side extensions, since the aim was to hide all the messy transport details. But over the years it’s become apparent some people want to use various HTTP auth mechanisms and other features in their extensions, so Faye 1.0 makes this available with the caveat that usage is at your own risk. For certain HTTP features, you must read the updated security docs, particularly the material on CSRF protection.

In short, if your extension methods have 3 parameters, the second one will now be a request object (http.Request on Node, Rack::Request on Ruby). This gives you access to the path, headers and body of the request, which you can use to filter messages.

server.addExtension({
  incoming: function(message, request, callback) {
    if (request.headers.authorization !== 'open sesame') {
      message.error = '403::Access denied';
    }
    callback(message);
  }
});

2-parameter extensions still work exactly the same as they always did.

That just about covers the main changes in the 1.0 release. As always, file bugs on GitHub and ask questions on the mailing list.

On a personal note, I want to thank everyone who has built stuff, asked questions, reported bugs, fixed code, and donated their time to this project. It has been and continues to be really rewarding to work on, and has taken me all over the place talking at confs and meeting people who want to hear about it. It was particularly gratifying to see Faye powering the GOV.UK exhibit at the 2013 Design Awards, at one of my favourite London museums. I’m proud to have helped a lot of projects build on my work but there’s nothing quite like seeing something you helped build in the real world, especially somewhere as dear to me as the Design Museum.

Thank you, and I hope you keeping putting Faye to good use.

Running RSpec tests from the browser

As a fun demo of the flexibility of jstest, I thought I’d show how you can use it to run tests that aren’t even JavaScript. This is a silly example but it actually demonstrates the power of the framework and I’ve used these capabilities to solve real testing problems when running JavaScript on unusual platforms.

jstest has a plugin system for changing the output format; all the different output formats and runner adapters come from objects that all implement a standard interface; the test runner invokes these methods on the reporter to notify it of what’s going on so the reporter can produce output. The event objects passed to these methods are self-contained JavaScript data structures that can be easily serialized as JSON, and indeed this JSON stream is one of the built-in output formats.

The nice thing about the JSON format is it makes it easy to sent reporter data over a wire, parse it and give to another reporter running in a different process, so you can print browser results in a terminal and vice-versa. This is how the PhantomJS integration works: the JSON reporter writes to the browser console, and PhantomJS can pick this data up and reformat it using one of the text-based output formats in the terminal.

The docs for the JSON reporter show an example of running a server-side test suite using the browser UI, by making the server process emit JSON, sending this JSON over a WebSocket and handing it to the browser reporter. The nice thing about this is the WebSocket and browser don’t care where the JSON came from – any process that emits jstest-compatible JSON will do. So, we can use this system to run Ruby tests!

To begin with, let’s write a little RSpec test:

// spec/ruby_spec.rb

describe Array do
  before do
    @strings = %w[foo bar]
  end

  it 'returns a new array by mapping the elements through the block' do
    @strings.map(&:upcase).should == %w[FOO BAR]
  end
end

Now we just need to make RSpec emit JSON, which we can do using a custom formatter:

$ rspec -r ./spec/json_formatter -f JsonFormatter ./spec
{"jstest":["startSuite",{"children":[],"size":1,"eventId":0,"timestamp":1372708952811}]}
{"jstest":["startContext",{"fullName":"Array","shortName":"Array","context":[],"children":["map"],"eventId":1,"timestamp":1372708952811}]}
{"jstest":["startContext",{"fullName":"Array map","shortName":"map","context":["Array"],"children":[],"eventId":2,"timestamp":1372708952812}]}
{"jstest":["startTest",{"fullName":"Array map returns a new array by mapping the elements through the block","shortName":"returns a new array by mapping the elements through the block","context":["Array","map"],"eventId":3,"timestamp":1372708952812}]}
{"jstest":["update",{"passed":true,"tests":1,"assertions":1,"failures":0,"errors":0,"eventId":4,"timestamp":1372708952812}]}
{"jstest":["endTest",{"fullName":"Array map returns a new array by mapping the elements through the block","shortName":"returns a new array by mapping the elements through the block","context":["Array","map"],"eventId":5,"timestamp":1372708952812}]}
{"jstest":["endContext",{"fullName":"Array map","shortName":"map","context":["Array"],"children":[],"eventId":6,"timestamp":1372708952812}]}
{"jstest":["endSuite",{"passed":true,"tests":1,"assertions":1,"failures":0,"errors":0,"eventId":7,"timestamp":1372708952812}]}

Next, we need a server, specifically a WebSocket server that will trigger a test run each time a connection is made. When we open a WebSocket to ws://localhost:8888/?test=map, the server should run this command and pipe the output into the WebSocket, sending each line of output as a separate message:

$ rspec -r ./spec/json_formatter -f JsonFormatter ./spec -e map

This is easily accomplished using the faye-websocket and split modules from npm:

// server.js

var child     = require('child_process'),
    http      = require('http'),
    url       = require('url'),
    split     = require('split'),
    WebSocket = require('faye-websocket')

var bin  = 'rspec',
    argv = ['-r', './spec/json_formatter', '-f', 'JsonFormatter', './spec']

var server = http.createServer()

server.on('upgrade', function(request, socket, body) {
  var ws = new WebSocket(request, socket, body),

      params  = url.parse(request.url, true).query,
      tests   = JSON.parse(params.test),

      options = tests.reduce(function(o, t) { return o.concat(['-e', t]) }, []),
      proc    = child.spawn(bin, argv.concat(options))

  proc.stdout.pipe(split()).pipe(ws)
})

server.listen(8888)

And finally, we need a web page that will open a socket, and channel the messages into the jstest browser reporter. We have a special class for this: JS.Test.Reporters.JSON.Reader takes lines of JSON output, parses them and dispatches the data to a reporter, making sure the messages are replayed in the right order.

By using JS.Test.Runner to get the current run options, we can tell which tests the user has selected to run, and send their names to the server that will pass these names on to rspec.

<!-- browser.html -->

<!doctype>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>RSpec in the browser</title>
  </head>
  <body>

    <script type="text/javascript" src="./node_modules/jstest/jstest.js"></script>

    <script type="text/javascript">
      var options = new JS.Test.Runner().getOptions(),

          R       = JS.Test.Reporters,
          browser = new R.Browser(options),
          reader  = new R.JSON.Reader(browser)

      var test = encodeURIComponent(JSON.stringify(options.test)),
          ws   = new WebSocket('ws://localhost:8888/?test=' + test)

      ws.onmessage = function(event) {
        reader.read(event.data)
      }
    </script>

  </body>
</html>

If you start the server and open the web page, you should see the results in the browser!

rspec

Clicking the green arrows next to the tests reloads the page with that test selected, so we can use this to run a subset of our Ruby tests.

As I said, this is a silly example but it shows the power of the jstest reporting API. You can use this approach in reverse to send browser test results to the terminal and do other useful things.

For example, a while ago I was working on a Spotify application. It’s quite hard to make Spotify reload the page without completely shutting it down and restarting it. I wanted to drive the tests quickly from the terminal, so I made a little script to help me do this. I made page in my app that opened a Faye connection to a server, and when it received a certain message it would reload the page using window.location and re-run my tests. The tests used a custom reporter to send updates over another Faye channel. My script would send the reload command, then listen for progress updates and channel them into one of jstest‘s text reporters. This only took about a page of code and hugely improved my productivity when building my app.

This demonstrates the power of having a simple serializable data format for test events, and reporters that work on any platform out of the box.

jstest 1.0: the cross-platform JavaScript test framework finally released as a standalone package

After hacking away on it for months, I’m happy to announce the 1.0 release of jstest as a standalone package. jstest has long been part of the jsclass library but over time it’s become the module I use the most: I test all my other JavaScript projects with it, not just those based on jsclass.

Using it as heavily as I do, I’ve noticed how its user interface can be troublesome. To being with, you need to install a bunch of apparently unrelated stuff from jsclass in order to use it, and you have to learn how the jsclass module system works, and plenty more besides. I wanted to make it so that it’s easy to get cross-platform JavaScript testing in a simple, single-file package that you could easily drop into any project, and have it just work no matter what platform you’re running on. That’s what the new jstest package achieves.

It bundles up everything you need to run JavaScript tests into a single file that you can download from the website or from npm. The website contains full documentation for how to use and extend it, presenting it as a library in its own right rather than some obscure part of jsclass. I hope the new packaging and improved docs make it much easier to get started and use this library.

But, this is not simply a marketing or refactoring exercise. jstest has always been about running on as many JS platforms as possible, and this release is no different. But until now it’s been hard to integrate into other workflows: there was no way of changing the output format, or integrating with different test runners, and the platforms it did support – TestSwarm, and PhantomJS via a clunky JSON interface – were hard-coded. That all changes in 1.0 thanks to the addition of reporter plugins: an API for adding new output formats to the framework. It already includes a ton of useful output formats that work on all supported platforms, and adapters for many new test runners including Buster.JS, Karma, Teabag, Testem, Testling CI and TestSwarm.

The text output formats are built in such a way that they run on any platform, including in the browser, meaning you can now do things like use any of the output formats for PhantomJS browser tests or send test output over a socket and reformat it somewhere else. The reporters include several standard formats like TAP, TAP-Y/J and JUnit XML for integration with other reporting systems. There’s even an API you can use to make sure your own reporters work everywhere seamlessly.

This release also involves a new release of jsclass. Version 4.0 is really a maintenance release that supports this work, but notably turns all the library modules into CommonJS-compatible modules, so we’re finally free of our pre-Node legacy of managing dependencies via global variables. As before, they still work out of the box on all supported platforms, but if you’re using CommonJS they’ll be better behaved on your platform.

Finally, getting this release done finally frees me up to work on my testing book. While I emphatically don’t want the book to be about any particular tool, having a framework that just works out of the box is crucial for me to keep the book ‘usable’. Getting other frameworks to work cross-platform typically involves fiddling with a bunch of hacky plugins or googling to find out how other people managed it, and it’s important to me that the tools that ship with the book just work, without the reader having to figure out how some random plugin has changed since the book was published. I want to write about how to approach testing and architecture problems in general, rather than about any particular framework, so having tools that get out of the way is a big part of making the book useful and not wasting its readers’ time messing with project config.

So, as usual: download it, and let me know what you think.

And sign up for the book!

websocket-driver: an I/O-agnostic WebSocket module, or, why most protocol libraries aren’t

A couple of days ago I pushed the latest release of faye-websocket for Node and Ruby. The only user-facing change in version 0.5 is that the library now better supports the I/O conventions of each platform; on Node this means WebSocket objects are now duplex streams so making an echo server is as simple as:

var http      = require('http'),
    WebSocket = require('faye-websocket');

var server = http.createServer();

server.on('upgrade', function(request, socket, body) {
  var ws = new WebSocket(request, socket, body);
  ws.pipe(ws);
});

server.listen(8000);

On Ruby, it means that Faye::WebSocket now supports the rack.hijack API for accessing the TCP socket, which means you can now use it to handle WebSockets in apps served by Puma, Rainbows 4.5, Phusion Passenger 4.0, and other servers.

But there’s a much bigger change behind the scenes, which is that faye-websocket is now powered by an I/O agnostic WebSocket protocol module called websocket-driver, available for Node and Ruby. The entire protocol is encapsulated in that module such that all the user needs to do is supply some means of doing I/O. faye-websocket is now just a thin module that hooks websocket-driver up to various I/O systems, such as Rack and Node web servers or TCP/TLS sockets on the client side.

I started work on this a few weeks ago when the authors of Celluloid and Puma asked me if faye-websocket could be used to add WebSocket support to those systems. I said it could probably already do this, since Poltergeist and Terminus have been using the protocol classes with Ruby’s TCPServer for a while without too much effort. So I began extracting these classes into their own library, and wrote the beginnings of some documentation for them.

But as I got into explaining how to use this new library, I noticed how hard it was to use correctly. Loads of protocol details were leaking out of these classes and would have to be reimplemented by users. For example, here’s a pseudocode-ish outline of how the client would have to process data it received over TCP. If it looks complicated, that’s because it is complicated, but I’ll explain it soon enough.

class Client
  def initialize(url)
    @uri       = URI.parse(url)
    @parser    = Faye::WebSocket::HybiParser.new(url, :masking => true)
    @state     = :connecting
    @tcp       = tcp_connect(@uri.host, @uri.port || 80)
    @handshake = @parser.create_handshake

    @tcp.write(@handshake.request_data)
    loop { parse(@tcp.read) }
  end

  def parse(data)
    case @state
    when :connecting
      leftovers = @handshake.parse(data)
      return unless @handshake.complete?
      if @handshake.valid?
        @state = :open
        parse(leftovers)
        @queue.each { |msg| send(msg) } if @queue
      else
        @state = :closed
      end
    when :open, :closing
      @parser.parse(data)
    end
  end

  def send(message)
    case @state
    when :connecting
      @queue ||= []
      @queue << message
    when :open
      data = @parser.frame(message, :text)
      @tcp.write(data)
    end
  end
end

But using websocket-driver the equivalent implementation would be:

class Client
  attr_reader :url

  def initialize(url)
    @url    = url
    @uri    = URI.parse(url)
    @driver = WebSocket::Driver.client(self)
    @tcp    = tcp_connect(@uri.host, @uri.port || 80)

    @driver.start
    loop { parse(@tcp.read) }
  end

  def parse(data)
    @driver.parse(data)
  end

  def send(message)
    @driver.text(message)
  end

  def write(data)
    @tcp.write(data)
  end
end

So before, the client had to implement code to create a handshake request, split the input stream on whether it was currently parsing the HTTP handshake headers or a WebSocket frame and switch state accordingly, remembering to parse the leftovers; it’s entirely possible you might receive the handshake headers and some WebSocket frame data in the same data chunk, and you can’t drop that frame data. Because of the design of the WebSocket wire format, dropping or misinterpreting even one byte of input changes the meaning of the rest of the stream, possibly leading to behaviour an attacker might exploit.

It also had to maintain state around sending messages out, since messages can only be sent after the handshake is complete. So if you tried to send a message while in the :connecting state, it would put the message in a queue and deliver it once the handshake was complete.

When we switch to websocket-driver, all those concerns go away. We treat the whole TCP input stream as one stream of data, because that’s what it is. We stream all incoming bytes to the driver and let it deal with managing state. It will emit events to tell us when interesting things happen, like the handshake completing or a message being received. When we want to send a message, we tell the driver to format it as a text frame. If the driver knows the handshake is not complete it will queue it and deliver it when it’s safe to do so. In the second example, we don’t even mention the concept of handshakes: the user doesn’t need to know anything about how the protocol works to use the driver correctly. The new Client class just hooks the driver up to a TCP socket and provides an API for sending messages.

The driver produces TCP output by calling the client’s write() method with the data we should send over the socket. When we call @driver.start, the driver calls client.write with a string containing handshake request headers. When we call @driver.text("Hello"), the driver will call client.write("\x81\x05Hello") (for unmasked frames), either immediately or after the handshake is complete.

This final point highlights the core problem with a lot of protocol libraries. By taking a strictly object-oriented approach where all protocol state is encapsulated and objects send commands to one another, we’ve allowed the protocol library to control when output happens, not just what output happens. A protocol is not just some functions for parsing and formatting between TCP streams and domain messages, it’s a sequence of actions that must be performed in a certain order by two or more parties in order to reach a goal. A protocol library, if it wishes to help users deploy the protocol correctly and safely, should drive the user’s code by telling it when to do certain things, not just give the user a bag of parsing ingredients and ask them to glue them together in the right order.

The fact that other protocol libraries have no means of telling the user when to send certain bits of output means that they end up leaking a lot of protocol details into the user’s code. For example, WebSocket has various control frames aside from those for text and binary messages. If you receive a ‘ping’ frame, you must respond by sending a ‘pong’ frame containing the same payload. If you receive a ‘close’ frame, you should respond in kind and then close the connection. If you receive malformed data you should send a ‘close’ frame with an error code and then close the connection. So there are various situations when the parser should react, not by yielding the data to the user, but by telling the user to send certain types of responses. But the most-downloaded Ruby library for this (websocket) handles the latter case by yielding the data to the user and expecting them to do the right thing.

I’ve tried reimplementing faye-websocket’s Client class on top of websocket and the amount of boilerplate required is huge if you want to produce a correct implementation. Here’s a laundry list of stuff you need to implement yourself (links are to relevant sections of code):

So this protocol library not only leaks by making the user track the state of the connection and the state of the parser, but also makes them implement stuff the protocol should deal with. Almost all the above points are behaviours set down in the specification; the user must implement them this way or their deployment is buggy. Since the user has no meaningful control over how this stuff works, all this code is just boilerplate that requires significant knowledge to write correctly. In contrast, faye-websocket and websocket-driver have never emitted events on ping frames because the user has no choice over how to handle them, so why make them write code for that? In websocket-driver, all the above points (and this list is not exhaustive) are dealt with by the protocol library and this gives users a much better hope of deploying WebSockets correctly and safely.

I’m not saying the websocket library is broken, per se. I’m saying it doesn’t go far enough. In Ruby we have lots of different means of doing network I/O, and there’s a few in Node if you consider HTTP servers and TCP/TLS sockets, though they all have similar interfaces. If you want to build a widely useful protocol library, you should solve as many problems as possible for the user so that they just need to bring some I/O and they’re pretty much done. Asking the user to rebuild half the protocol themselves is a recipe for bugs, security holes and wasted effort. We shouldn’t have to rebuild each protocol for every I/O stack we invent, so let’s stop.

Let the user tell you what they want to do, and then tell their code how and when to realize this over TCP. If you find yourself explaining the protocol mechanics when you’re documenting your library, it’s not simple enough yet. Refactor until I don’t need to read the RFC to deploy it properly.

Callbacks, promises and simplicity

Since publishing Callbacks are imperative, promises are functional a couple of days ago I’ve received a huge volume of feedback. It’s by far the most widely-read thing I’ve written and there have been far too many comments dotted about the web to cover them individually. But I’d like to highlight two articles written in response, both entitled ‘Broken promises’.

The first, by Drew Crawford, is an excellent and detailed technical analysis of the limitations of promises, in particular their applicability to resource-constrained situations. It’s derived from his experience trying to apply them to iOS programming, and failing. I encourage you to read the whole thing, but two important themes pop out for me.

First, the cost of modelling a solution using promises is increased memory usage. Callback-based programs (in JavaScript) store their state on the stack and in closures, since most data they use come from function arguments and local variables. The data they use tends to be quite fragmented; you don’t typically build large object graphs while using callbacks, and this can help keep memory usage down. On the other hand, good promise-based programs use relationships between values, encoded in arrays and object references, to solve problems. Thus they tend to produce larger data structures, and this can be a problem if your solution won’t fit in memory on your target runtime. If you hit a limit, you end up having to introduce manual control flow at which point callbacks become more appealing.

Second, promises let you delegate scheduling and timing to your tools, but sometimes that’s not what you want. Sometimes there are hard limits on how long something can take, or you want to bail out early for some reason, or you want to manually control ordering of things. Again, in such situations, manual control flow is the right tool and promises loose their appeal.

I didn’t mean to give the impression that promises are the one true solution to async programming (I try to avoid thinking of ‘one true’ anything, so Drew’s casting of me as an ‘evangelist’ is a little bruising). There are certainly plenty of situations where they’re not appropriate, and Drew catalogues a number of those very well. My only intention was to add a few more things to my mental toolbox so I can dig them out when I spot an appropriate occasion.

The second response is from Mikeal Rogers, whom I quote in my original article. (There’s also a more recent version of his talk.) The bulk of this article concerns the community effect of Node’s design decision. Again, you should read his article rather than take my summary for granted.

Mikeal argues that the success of the Node ecosystem relies on how little the core platform provides. Instead of shipping with a huge standard library, Node has a bare minimum of building blocks you need to get started writing network apps. One of those building blocks is the conventions it sets up: the implicit contracts that mean all the modules in the ecosystem interoperate with each other. These include the Stream interface, the module system, and the function(error, result) {} callback style. Libraries that adhere to these interoperate with one another easily, and this is tremendously valuable.

I actually commend the Node team for their attitude on this stuff. I’ve been using Node, and Faye has been built on it, since v0.1. I was around before promises were taken out of core, and I can see that the volume of disagreement on them at the time meant they shouldn’t have been standardized. And bizarrely, despite its early experimental status, I’ve found Node to be remarkably stable. I’ve been doing Ruby since 2006 and Node since early 2010, and I can honestly say Node has broken my stuff less while going from v0.1 to v0.10 than Ruby and its libraries have even during minor/patch releases. Paying attention to compatibility saves your users a huge amount of time and it’s something I try to do myself.

I recognize that the Node team have their hands somewhat tied. Even if they wanted to go back to promises, doing so would break almost every module in npm, which would be a terrible waste of everyone’s time. So in light of this, please take the following as a history lesson rather than a plea to change Node itself.

Mikeal’s argument goes that minimizing core creates a market for new ideas in the library ecosystem, and we’ve seen that with various control flow libraries. This is certainly true, but I think there’s an interesting difference where control flow is concerned, when compared to other types of problem. In most other popular web languages, the problem solved by the function(error, result) {} convention is solved at the language level: results are done with return values, errors with exceptions. There is no market for alternatives because this is usually a mechanism that cannot be meaningfully changed by users.

But I would also argue that the market for solutions to control flow in Node is necessarily constrained by what core does. Let’s look at a different problem. Say I’m in the market for a hashtable implementation for JavaScript. I can pick any implementation I like so long as it does the job, because this problem is not constrained by the rest of the ecosystem. All that matters is that it faithfully implements a hashtable and performs reasonably well. If I don’t like one, I can pick up another with a different API, and all I need to change is my code that uses the hashtable.

Most problems are like this: you just want a library that performs some task, and using it does not affect the rest of your system. But control flow is not like this: any control flow library out there must interoperate with APIs that use callbacks, which in Node means basically every library out there. So people trying to solve this problem do not have free reign over how to solve it, they are constrained by the ecosystem. No-one in the Ruby community thinks of return values and exceptions as being particularly constraining, but because in Node this concern is a library rather than a language feature, people have the freedom to try and change it. They just can’t change it very much, because they must remain compatible with everything else out there.

And to me that’s the history lesson: when you’re designing a platform, you want to minimize the number of contracts people must agree to in order to interoperate. When people perceive they have the ability to change things, but they really can’t, a tension arises because alternative solutions won’t mesh well with the ecosystem. The more concepts require these contracts, the fewer ways users can experiment with alternative models.

Mikeal also goes into the relative ease of use of callbacks and my promise-based solutions. I disagree with most of this but that’s not really important. We all consider different things to be easy, have different sets of knowledge, and are working on different problems with different requirements. That’s fine, so long as you don’t make technical decisions by simply pandering to the most ignorant. While we all need to write code that others can use and maintain, I hope part of that process involves trying to increase our collective knowledge rather than settling for what we know right now. Short of a usability study, any assertions I can make here about ease of use would be meaningless.

But I want to finish up on a word that’s frequently interpreted to mean ‘easy’: ‘simple’. I will assert that my promise-based solutions are simpler than using callbacks, but not in the ease-of-use sense. I mean in the sense that Rich Hickey uses in his talk Simple Made Easy, which is to say ‘not complex’, not having unrelated things woven together. This is an objective, or at least observable and demonstrable, property of a program, that we can examine by seeing how much you must change a program to change its results.

Let’s revisit my two solutions from the previous article, one written with callbacks and one with promises. This program calls fs.stat() on a collection of paths, then uses the size of the first file for some task and uses the whole collection of stats for some unrelated task. Here are the two solutions:

var paths = ['file1.txt', 'file2.txt', 'file3.txt'],
    file1 = paths.shift();

async.parallel([
  function(callback) {
    fs.stat(file1, function(error, stat) {
      // use stat.size
      callback(error, stat);
    });
  },
  function(callback) {
    async.map(paths, fs.stat, callback);
  }
], function(error, results) {
  var stats = [results[0]].concat(results[1]);
  // use the stats
});
var fs_stat = promisify(fs.stat);

var paths = ['file1.txt', 'file2.txt', 'file3.txt'],
    statsPromises = list(paths.map(fs_stat));

statsPromises[0].then(function(stat) {
  // use stat.size
});

statsPromises.then(function(stats) {
  // use the stats
});

Now, I would say the first is uglier than the second, but this is neither objective nor particularly instructive. To see how much more complex the first solution is, we must observe what happens when we try to change the program. Let’s say we no longer want to do the task with the size of the first file. Then the promise solution simply involves removing the code for that task:

var fs_stat = promisify(fs.stat);

var paths = ['file1.txt', 'file2.txt', 'file3.txt'],
    statsPromises = list(paths.map(fs_stat));

statsPromises.then(function(stats) {
  // use the stats
});

It would go similarly if we had a set of unrelated operations waiting to complete rather than the same operation on a set of inputs. We would just remove that operation from a list of promises and we’d be done. Often, changing promise-based solutions is the same as changing synchronous ones: you just remove a line, or change a variable reference or array index, or modify a data structure. These are changes to your program’s data, not to its syntactic structure.

Now let’s remove that task from the callback solution. We start by removing all the code that treats the first file as a special case:

var paths = ['file1.txt', 'file2.txt', 'file3.txt'];

async.parallel([
  function(callback) {
    async.map(paths, fs.stat, callback);
  }
], function(error, results) {
  var stats = results[0];
  // use the stats
});

We’ve removed the additional variable at the start, the special treatment of the first file, and the array-munging at the end. But there’s no point having an async.parallel() with one item in it, so let’s remove that too:

var paths = ['file1.txt', 'file2.txt', 'file3.txt'];

async.map(paths, fs.stat, function(error, stats) {
  // use the stats
});

So removing that one task was a tiny change to the promise solution, and a huge change to the callback solution: often, changing what you want a callback-based program to do involves changing its syntactic structure. The difference between the two approaches is that the promise solution keeps the notions of promises, collections and asynchrony and ordering separate from one another, whereas the callback-based solution conflates all these things. This is why the promise-based program requires much less change.

So while I think it’s fruitless to argue about how easy a task is, I think you can demonstrate fairly objectively how simple it is. And while I admire what Node has achieved in many areas around interoperability through simplicity, this is one area where I wish it had gone the other way. Fortunately, JavaScript is such that we have ways of routing around designs we don’t care for.

Thanks to Drew and Mikeal for taking the time to respond. I would welcome any further feedback about how to improve either of my above approaches.