Graphical thinking: optimisation through functional programming

A little over a year ago, I wrote an essay called Callbacks are Imperative, Promises are Functional, in which I argued that promises are not a control-flow device but rather a tool to describe relationships between dependent values, such that the promise library can figure out the control flow – the correct and optimal order of operations – for you. I gave the example of a module loader, where by modelling a set of modules as a graph of lazy promises, we can write code that describes the dependencies between modules and automatically have them downloaded in the correct order and with the maximum possible parallelisation. This echoes a quote from the book Learn You a Haskell for Great Good:

In purely functional programming you don’t tell the computer what to do as such but rather you tell it what stuff is.

Learn You a Haskell for Great Good

Writing the control flow for optimally downloading modules by hand is rather hard; better to invent a system where you can describe the shape of your domain and have the system solve the problem for you.

A few months ago, I wrote a tutorial on Make, having spent a year with it while writing JavaScript Testing Recipes. Though I didn’t realise it when writing my promises essay, there is a deep similarity between Make and my promise-based module loader: in both systems we work by defining dependencies among a set of artefacts, and then the platform figures out which things must be run and it what order when you ask for one of those artefacts. Make can even parallelise the tasks it runs when it spots that two tasks are independent and can therefore be run in any order – if it doesn’t matter which order you perform two tasks in, you can probably run them in parallel.

This week I was working on some Redis code, and I realised that this concept of solving problems by relating values and deriving the control flow from those relationships had given me an insight into solving a problem I’d been having with performing atomic changes to a Redis database. I don’t use any of the above concepts directly in the code, but I found that they gave me a new way to think about the problem and suddenly realise a solution. Let me try to explain.

Shortly before writing that promises article, I released reStore, a remoteStorage server written in Node with both filesystem- and Redis-based backends. remoteStorage is a protocol for storing arbitrary documents via HTTP; each user on a server has a directory tree and they can grant applications access to parts of that tree for the application to store user data. The application uses GET to retrieve documents by pathname, and receives the document’s content, plus its media type in a Content-Type header and its version in an ETag. It can also request folder listings using GET to a pathname ending in a trailing slash, which returns a list of the folder’s children (documents and other folders) with their current versions.

The app can save new documents using PUT, setting their type using Content-Type, and remove them using DELETE. Both these methods allow use of the If-Match header; the application can pass in the current version it has for a document, and the server will process the request only if that version is indeed current. Otherwise, the app makes a GET to retrieve the latest version, reapplies its changes and tries to save again; this prevents applications from clobbering one another’s changes.

Now, these semantics lend themselves naturally to using the filesystem as a backend, and indeed reStore has such a backend. But it also has a Redis backend, and I’ve recently been doing some work to make sure that when it saves and deletes documents, all changes to the Redis database are done atomically.

Most of the complexity in doing this comes from the fact that remoteStorage is not quite as low-level as the Unix filesystem. When you save a document, all its parent folders are created implicitly, and if you delete the only remaining item from a folder then that folder is deleted. So folders exist only if they contain documents, or other folders. Clients never manipulate folders directly, they only deal with saving documents.

Furthermore, all folders have an ETag version just like documents do. A folder’s version must change if any document within its tree is updated or removed. This is to facilitate syncing: a client can easily tell if its copy of a tree is up to date by comparing the version tag on the root folder, rather than checking every document within the tree.

So when a document is saved, we need to make sure that all its containing folders exist and update all of their versions. (We could store only the documents and figure out the folders when a GET is made, but it’s cheaper to generate the tree on write than on read.) For ease of implementation, the version of a document in reStore is simply the Unix timestamp of the last time it was changed (the remoteStorage spec previously required versions to be timestamps and it used Last-Modified rather than ETag). The version of a folder is the latest timestamp that exists on any of its children.

Here’s a naive version of a backend store for the requirements we’ve layed out. Although reStore is written in Node, I’ve written these examples in Ruby since it’s far easier to see what’s actually happening without all those callbacks getting in the way. It has one public method: put(), that handles requests to save documents. It takes a username, document path, content type, and optional version string. It begins by splitting the pathname into its segments, for example parse_path('/path/to/file.txt') returns ['/', 'path/', 'to/', 'file.txt']. It checks whether the given version is current or not – if no version is given, the request is always allowed, but if a version is given for a document that doesn’t exist the request is forbidden since the client has stale data – then proceeds to update the tree. It uses SADD to make sure each item is listed as a child of its parent folder, and HSET to set the modified time on each parent folder to the current time. Finally, it checks if the target file exists (we need to tell the client if the document was created or updated), then saves its content, media type, size and modified time using HMSET.

class Store
  VersionConflict = Class.new(StandardError)

  def initialize(redis_client)
    @redis = redis_client
  end

  def put(username, pathname, content, type, version = nil)
    query    = parse_path(pathname)
    filename = query.pop
    prefix   = "users:#{username}:data:"
    modified = (Time.now.utc.to_f * 1000).floor

    is_current = get_current_state(prefix + pathname, version)
    raise VersionConflict unless is_current

    query.each_with_index do |dir, i|
      key = prefix + query.slice(0, i + 1).join('')
      @redis.hset(key, 'modified', modified.to_s)
      @redis.sadd(key + ':children', query[i + 1] || filename)
    end

    exists = @redis.exists(prefix + pathname)

    @redis.hmset(prefix + pathname, :length,   content.size,
                                    :type,     type,
                                    :modified, modified,
                                    :content,  content)
    [!exists, modified]
  end

private

  def parse_path(pathname)
    pathname.scan(/[^\/]*(?:\/|$)/)[0..-2]
  end
end

Here’s what this code does when we run the call:

store.put('jcoglan', '/books/jstr/preface.txt', 'Preface to JSTR', 'text/plain')

The Redis MONITOR command prints the following:

"hget" "users:jcoglan:data:/books/jstr/preface.txt" "modified"
"hset" "users:jcoglan:data:/" "modified" "1402430750408"
"sadd" "users:jcoglan:data:/:children" "books/"
"hset" "users:jcoglan:data:/books/" "modified" "1402430750408"
"sadd" "users:jcoglan:data:/books/:children" "jstr/"
"hset" "users:jcoglan:data:/books/jstr/" "modified" "1402430750408"
"sadd" "users:jcoglan:data:/books/jstr/:children" "preface.txt"
"exists" "users:jcoglan:data:/books/jstr/preface.txt"
"hmset" "users:jcoglan:data:/books/jstr/preface.txt" "length" "15" \
        "type" "text/plain" "modified" "1402430750408" "content" "Preface to JSTR"

So we can see it checking the version of the target document, updating all the parent folders and finally the document itself. However, there are two glaring holes in this implementation:

  • While we’re running these commands, another client might be doing work that becomes interleaved with ours. We might both check the current version, find that we’re up to date, and then both go ahead and update the document, clobbering one another’s changes. This is precisely the ‘interceding update’ problem that the version check is supposed to fix.
  • If our server process crashes part-way through this set of updates, the tree is left in an invalid state. Folders are said to have children that do not actually exist, or have modification times that don’t reflect the true state of things.

To make the system robust, we need to solve both of these problems, and this will partly involve making better use of the tools Redis gives us, and also re-framing the problem slightly to make it fit better into Redis’s constraints. The former technique is certainly useful but it’s the latter that I am particularly interested in here.

The first problem is actually quite easy to solve: since we’re making changes to the user’s entire directory tree, conditional on the state of one document, we will lock the user’s data while this is happening. To have each update have a consistent view of the world, they must be applied in sequence, and the lock will make sure only one update executes at a time.

The documentation for SETNX provides a locking algorithm for us. You should read those docs to understand the algorithm; here’s an implementation in Ruby:

  LOCK_TIMEOUT = 10_000

  def lock(name)
    result = nil

    while result.nil?
      lock_key     = "locks:#{name}"
      current_time = (Time.now.utc.to_f * 1000).floor
      expiry       = current_time + LOCK_TIMEOUT

      if @redis.setnx(lock_key, expiry)
        next result = yield.tap { @redis.del(lock_key) }
      end

      timeout = @redis.get(lock_key)

      if timeout.nil? or current_time < timeout.to_i
        next sleep 0.1
      end

      old_value = @redis.getset(lock_key, expiry)

      if old_value == timeout.to_s
        result = yield.tap { @redis.del(lock_key) }
      else
        sleep 0.1
      end
    end

    result
  end

Wrapping this mechanism around our PUT logic lets us make sure that nothing about the user’s tree will change between us checking the version and making our changes.

  def put(username, pathname, content, type, version = nil)
    query    = parse_path(pathname)
    filename = query.pop
    prefix   = "users:#{username}:data:"

    lock username do
      modified = (Time.now.utc.to_f * 1000).floor

      is_current = get_current_state(prefix + pathname, version)
      raise VersionConflict unless is_current

      query.each_with_index do |dir, i|
        key = prefix + query.slice(0, i + 1).join('')
        @redis.hset(key, 'modified', modified.to_s)
        @redis.sadd(key + ':children', query[i + 1] || filename)
      end

      exists = @redis.exists(prefix + pathname)

      @redis.hmset(prefix + pathname, :length,   content.size,
                                      :type,     type,
                                      :modified, modified,
                                      :content,  content)
      [!exists, modified]
    end
  end

The second problem is more subtle. We want to make all these changes to the database – a combination of HSET, SADD and HMSET commands – happen as one, so Redis will execute them as a unit. Fortunately, Redis has transactions that let us do exactly this. But, there’s one minor problem: a Redis transaction works by sending the MULTI command, then all the commands you want to run together, then sending EXEC. Only then will all the commands be executed; you cannot have a command that depends on the result of another command, because you don’t get any responses until after all the commands have run.

In our code, there’s a pesky little EXISTS sitting between the writes to the parent folders and the document itself. This must happen before the final HMSET since we need to know if the document existed before that write, and there’s a temptation to put it as close to the HMSET as possible to make sure no other changes happen between the two.

However: notice that none of the writes depend on the result of the EXISTS call. They do depend on the result of the HGET that checks the version, since they are conditional on that, but everything after that depends only on information from the original method inputs. Because of this lack of dependencies, the writes can actually be done in any order and we will get the same result. As long as it happens at some point before the final HMSET, and within the lock so we know no other changes have happened, the EXISTS call can happen as early in the process as we like. We can push it before the parent folder updates:

  def put(username, pathname, content, type, version = nil)
    query    = parse_path(pathname)
    filename = query.pop
    prefix   = "users:#{username}:data:"

    lock username do
      modified = (Time.now.utc.to_f * 1000).floor

      is_current = get_current_state(prefix + pathname, version)
      raise VersionConflict unless is_current

      exists = @redis.exists(prefix + pathname)

      query.each_with_index do |dir, i|
        key = prefix + query.slice(0, i + 1).join('')
        @redis.hset(key, 'modified', modified.to_s)
        @redis.sadd(key + ':children', query[i + 1] || filename)
      end

      @redis.hmset(prefix + pathname, :length,   content.size,
                                      :type,     type,
                                      :modified, modified,
                                      :content,  content)
      [!exists, modified]
    end
  end

Or, we can get rid of it entirely: the initial version check can tell us whether the document exists, since it will have no modified key if it’s absent. Let’s modify get_current_state() to return two values: a boolean for whether the version is current, and one for whether the document exists:

  def get_current_state(key, version)
    modified = @redis.hget(key, 'modified')
    return [version.nil?, false] if modified.nil?
    return [true, true] if version.nil?
    return [version.to_i == modified.to_i, true]
  end

and use this new value in our put() method:

  def put(username, pathname, content, type, version = nil)
    query    = parse_path(pathname)
    filename = query.pop
    prefix   = "users:#{username}:data:"

    lock username do
      modified = (Time.now.utc.to_f * 1000).floor

      is_current, exists = get_current_state(prefix + pathname, version)
      raise VersionConflict unless is_current

      query.each_with_index do |dir, i|
        key = prefix + query.slice(0, i + 1).join('')
        @redis.hset(key, 'modified', modified.to_s)
        @redis.sadd(key + ':children', query[i + 1] || filename)
      end

      @redis.hmset(prefix + pathname, :length,   content.bytes.size,
                                      :type,     type,
                                      :modified, modified,
                                      :content,  content)
      [!exists, modified]
    end
  end

Now if we run our test again, we see this pattern of commands:

"setnx" "locks:jcoglan" "1402432911685"
"hget" "users:jcoglan:data:/books/jstr/preface.txt" "modified"
"hset" "users:jcoglan:data:/" "modified" "1402432901692"
"sadd" "users:jcoglan:data:/:children" "books/"
"hset" "users:jcoglan:data:/books/" "modified" "1402432901692"
"sadd" "users:jcoglan:data:/books/:children" "jstr/"
"hset" "users:jcoglan:data:/books/jstr/" "modified" "1402432901692"
"sadd" "users:jcoglan:data:/books/jstr/:children" "preface.txt"
"hmset" "users:jcoglan:data:/books/jstr/preface.txt" "length" "15" \
        "type" "text/plain" "modified" "1402432901692" "content" "Preface to JSTR"
"del" "locks:jcoglan"

Two important things have happened here: first, all the commands have happened inside a locking block bounded by the SETNX and DEL commands. Second, and more importantly for making the writes atomic: notice that this operation is now a read (HGET) followed by a sequence of writes (HSET, SADD, HMSET). The fact that all the writes are uninterrupted by any reads, and do not depend on one another, means we can wrap them in a MULTI transaction:

  def put(username, pathname, content, type, version = nil)
    query    = parse_path(pathname)
    filename = query.pop
    prefix   = "users:#{username}:data:"

    lock username do
      modified = (Time.now.utc.to_f * 1000).floor

      is_current, exists = get_current_state(prefix + pathname, version)
      raise VersionConflict unless is_current

      @redis.multi do
        query.each_with_index do |dir, i|
          key = prefix + query.slice(0, i + 1).join('')
          @redis.hset(key, 'modified', modified.to_s)
          @redis.sadd(key + ':children', query[i + 1] || filename)
        end

        @redis.hmset(prefix + pathname, :length,   content.bytes.size,
                                        :type,     type,
                                        :modified, modified,
                                        :content,  content)
      end

      [!exists, modified]
    end
  end

This leaves us with out final command sequence:

"setnx" "locks:jcoglan" "1402433277043"
"hget" "users:jcoglan:data:/books/jstr/preface.txt" "modified"
"multi"
"hset" "users:jcoglan:data:/" "modified" "1402433267046"
"sadd" "users:jcoglan:data:/:children" "books/"
"hset" "users:jcoglan:data:/books/" "modified" "1402433267046"
"sadd" "users:jcoglan:data:/books/:children" "jstr/"
"hset" "users:jcoglan:data:/books/jstr/" "modified" "1402433267046"
"sadd" "users:jcoglan:data:/books/jstr/:children" "preface.txt"
"hmset" "users:jcoglan:data:/books/jstr/preface.txt" "length" "15" \
        "type" "text/plain" "modified" "1402433267046" "content" "Preface to JSTR"
"exec"
"del" "locks:jcoglan"

We have made all the changes made by the put() method into a single transaction. Redis will not run any of them until it receives EXEC, preventing any of them from running if the process crashes while sending them. Either they all run, or none does.

This was kind of a silly example though: technically, we could have left that EXISTS call in the middle of the transaction. None of the writes depended on its result, so we could have left it in before the final HMSET and received its result at the end of the process. It turned out the EXISTS call was redundant anyway, but this example demonstrates a key idea: analysing the dependencies between operations, flattening that graph so that things happen as early as possible, disinterleaving reads and writes, and noticing when the order of commands does and does not matter, is a key strategy to atomising Redis operations. We got lucky with PUT, but DELETE is far more tricky.

DELETE looks a little like PUT run in reverse. We delete the target document, and remove it as a child from its parent folder. Then we go back up the tree, removing and updating things. If a folder has no children, we delete it. If it does have children, we set its version to the latest version of any of its remaining children, then move up to the next level. Here’s a first stab at an implementation, where we’ve added another return value to get_current_state(): we need to tell the client whether the deleted document existed and what its version was:

  def delete(username, pathname, version = nil)
    prefix   = "users:#{username}:data:"
    query    = parse_path(pathname)
    filename = query.pop
    parents  = (0...query.size).map { |i| query[0..i].join('') }.reverse

    lock username do
      is_current, exists, modified = get_current_state(prefix + pathname, version)
      raise VersionConflict unless is_current

      @redis.del(prefix + pathname)
      @redis.srem(prefix + parents.first + ':children', filename)

      parents.each_with_index do |parent, i|
        children = @redis.smembers(prefix + parent + ':children')
        if children.empty?
          @redis.del(prefix + parent)
          if i + 1 < parents.size
            item = query[query.size - i - 1]
            @redis.srem(prefix + parents[i + 1] + ':children', item)
          end
        else
          mtimes = children.map do |child|
            @redis.hget(prefix + parent + child, 'modified')
          end
          @redis.hset(prefix + parent, 'modified', mtimes.map(&:to_i).max)
        end
      end

      [exists, modified]
    end
  end

private

  def get_current_state(key, version)
    modified = @redis.hget(key, 'modified')
    return [version.nil?, false, nil] if modified.nil?
    return [true, true, modified.to_i] if version.nil?
    return [version.to_i == modified.to_i, true, modified.to_i]
  end

As an example, say we run the following commands to add three documents and then delete them in reverse order:

store.put('jcoglan', '/books/jstr/preface.txt',          'Preface to JSTR',         'text/plain')
store.put('jcoglan', '/books/jstr/chapters/browser.txt', 'Browser Applications',    'text/plain')
store.put('jcoglan', '/books/jstr/chapters/cli.txt',     'Command-line Interfaces', 'text/plain')

store.delete('jcoglan', '/books/jstr/chapters/cli.txt')
store.delete('jcoglan', '/books/jstr/chapters/browser.txt')
store.delete('jcoglan', '/books/jstr/preface.txt')

Here’s the command sequence for the second delete(), which will implicitly delete its parent folder but leave the others intact. We see it deleting the document, then its parent, then updating the versions of the other folders above it, by checking the children of each folder in turn.

"setnx" "locks:jcoglan" "1402435177397"
"hget" "users:jcoglan:data:/books/jstr/chapters/browser.txt" "modified"
"del" "users:jcoglan:data:/books/jstr/chapters/browser.txt"
"srem" "users:jcoglan:data:/books/jstr/chapters/:children" "browser.txt"
"smembers" "users:jcoglan:data:/books/jstr/chapters/:children"
"del" "users:jcoglan:data:/books/jstr/chapters/"
"srem" "users:jcoglan:data:/books/jstr/:children" "chapters/"
"smembers" "users:jcoglan:data:/books/jstr/:children"
"hget" "users:jcoglan:data:/books/jstr/preface.txt" "modified"
"hset" "users:jcoglan:data:/books/jstr/" "modified" "1402435167365"
"smembers" "users:jcoglan:data:/books/:children"
"hget" "users:jcoglan:data:/books/jstr/" "modified"
"hset" "users:jcoglan:data:/books/" "modified" "1402435167365"
"smembers" "users:jcoglan:data:/:children"
"hget" "users:jcoglan:data:/books/" "modified"
"hset" "users:jcoglan:data:/" "modified" "1402435167365"
"del" "locks:jcoglan"

This problem seems far harder to fix: in order to delete each folder or update its version, we need to check its children, and in order to make sure the children are up to date, we need to update the next folder down the chain, all the way back to the target document. This is, of course, how we would implement it on a real filesystem, where one cannot delete non-empty directories.

But there is a way out: think back to the PUT example: we made the writes atomic by figuring out as many of their parameters as possible in advance, allowing us to disinterleave the writes from anything they depended on, making it possible to run them in a single transaction.If you think about it for a minute, you’ll realise you can predict which folders will need to be deleted and updated before you start deleting anything at all, and we’ll use that trick to tease apart the reads and writes.

We can say, in advance, that if we delete a document then any folders that form a chain from its immediate parent that have only one child will need to be deleted: those folders exist entirely to contain the document we’re going to delete. So we can figure out the deletable folders in advance by walking up from the document until we find a folder with more than one child.

Similarly, we can figure out the versions of all the remaining folders after the deletion by reading all the relevant item versions in advance, that is, getting the versions of all the children of all the parents of the deleted document. Note this doesn’t involve reading the whole tree; only the immediate children of each folder need to be considered. From these lists, we can remove the version for the topmost deletable folder, then back-propagate to calculate the resulting versions of the other folders.

The code for this is significantly more verbose but I’ve tried to comment what’s going on at each point. You should be able to see how the previous approach of deleting an item, then checking how its parents need to be updated, then recursing up the tree, has been replaced with an algorithm that plans all the changes we need to make up front by doing that same process on data that we’ve already gathered into memory ahead of time.

  def delete(username, pathname, version = nil)
    prefix  = "users:#{username}:data:"
    query   = parse_path(pathname)
    parents = (0...(query.size - 1)).map { |i| query[0..i].join('') }.reverse

    # e.g. if query = ['/', 'books/', 'jstr/', 'preface.txt']
    #  then parents = ['/books/jstr/', '/books/', '/']

    lock username do
      is_current, exists, modified = get_current_state(prefix + pathname, version)
      raise VersionConflict unless is_current

      # Find all the immediate child items of the list of parent folders; this
      # returns an Array<Array<String>>.

      children = parents.map do |parent|
        @redis.smembers(prefix + parent + ':children')
      end

      # Determine which folders will be empty by starting with the immediate
      # parent and finding folders with only one child.

      empty, index = [], 0

      while index < parents.size and children[index].size == 1
        empty << parents[index]
        index += 1
      end

      # The remaining folders are the non-empty ones, i.e. those after the index
      # we just counted up to. Their children are the corresponding lists in the
      # children array.

      remaining = parents[index..-1]
      children  = children[index..-1]

      # Construct an Array<Array<Fixnum>> of modified times corresponding to the
      # children array.

      mtimes = children.map.with_index do |child_list, i|
        child_list.map do |child|
          @redis.hget(prefix + remaining[i] + child, 'modified').to_i
        end
      end

      # An index where we'll store the computed versions for remaining folders.

      ancestors = {}

      if remaining.size > 0
        # Remove the uppermost empty folder as a child from its parent, and
        # delete its corresponding version from the mtimes list.

        item = query[query.size - empty.size - 1]
        @redis.srem(prefix + remaining.first + ':children', item)
        mtimes.first.delete_at(children.first.index(item))

        # Back-propagate the version updates to the other remaining parent
        # folders. We calculate the new version from each folder's remaining
        # children, update this version in the mtimes list then repeat at the
        # next level up.

        remaining.each_with_index do |dir, i|
          ancestors[dir] = mtimes[i].max

          if i + 1 < remaining.size
            item = query[query.size - empty.size - 2 - i]
            mtimes[i + 1][children[i + 1].index(item)] = ancestors[dir]
          end
        end

        # Save the updated versions to the database.

        ancestors.each do |dir, mtime|
          @redis.hset(prefix + dir, 'modified', mtime)
        end
      end

      # Delete all the folders we determined were empty, and their sets of
      # children.

      empty.each do |dir|
        @redis.del(prefix + dir)
        @redis.del(prefix + dir + ':children')
      end

      # And finally, delete the document itself.

      @redis.del(prefix + pathname)

      [exists, modified]
    end
  end

This code produces the following command sequence for deleting the second item in our test:

"setnx" "locks:jcoglan" "1402437739275"
"hget" "users:jcoglan:data:/books/jstr/chapters/browser.txt" "modified"
"smembers" "users:jcoglan:data:/books/jstr/chapters/:children"
"smembers" "users:jcoglan:data:/books/jstr/:children"
"smembers" "users:jcoglan:data:/books/:children"
"smembers" "users:jcoglan:data:/:children"
"hget" "users:jcoglan:data:/books/jstr/preface.txt" "modified"
"hget" "users:jcoglan:data:/books/jstr/chapters/" "modified"
"hget" "users:jcoglan:data:/books/jstr/" "modified"
"hget" "users:jcoglan:data:/books/" "modified"
"srem" "users:jcoglan:data:/books/jstr/:children" "chapters/"
"hset" "users:jcoglan:data:/books/jstr/" "modified" "1402437729226"
"hset" "users:jcoglan:data:/books/" "modified" "1402437729226"
"hset" "users:jcoglan:data:/" "modified" "1402437729226"
"del" "users:jcoglan:data:/books/jstr/chapters/"
"del" "users:jcoglan:data:/books/jstr/chapters/:children"
"del" "users:jcoglan:data:/books/jstr/chapters/browser.txt"
"del" "locks:jcoglan"

Before, we had calls to SMEMBERS and HGET interleaved with SREM, HSET and DEL. Now however, he have completely disinterleaved the reads and writes; all the writes have been planned ahead of time and aren’t interleaved with the reads that determine what needs to be changed. This means we can wrap all the writes in a MULTI; I won’t reproduce the Ruby code for this (we wrap part of the delete() method in @redis.multi do ... end) but the commands look like this:

"setnx" "locks:jcoglan" "1402437849398"
"hget" "users:jcoglan:data:/books/jstr/chapters/browser.txt" "modified"
"smembers" "users:jcoglan:data:/books/jstr/chapters/:children"
"smembers" "users:jcoglan:data:/books/jstr/:children"
"smembers" "users:jcoglan:data:/books/:children"
"smembers" "users:jcoglan:data:/:children"
"hget" "users:jcoglan:data:/books/jstr/preface.txt" "modified"
"hget" "users:jcoglan:data:/books/jstr/chapters/" "modified"
"hget" "users:jcoglan:data:/books/jstr/" "modified"
"hget" "users:jcoglan:data:/books/" "modified"
"multi"
"srem" "users:jcoglan:data:/books/jstr/:children" "chapters/"
"hset" "users:jcoglan:data:/books/jstr/" "modified" "1402437839350"
"hset" "users:jcoglan:data:/books/" "modified" "1402437839350"
"hset" "users:jcoglan:data:/" "modified" "1402437839350"
"del" "users:jcoglan:data:/books/jstr/chapters/"
"del" "users:jcoglan:data:/books/jstr/chapters/:children"
"del" "users:jcoglan:data:/books/jstr/chapters/browser.txt"
"exec"
"del" "locks:jcoglan"

This is now a robust deletion procedure. The SETNX lock around the whole process means that while we’re making our reads to figure out what to do, we know the structure won’t change so we’ll get a consistent view of the data. And putting all the writes together in one transaction means that we cannot leave the database in an invalid state by accident. After every atomic change that Redis makes to the data, it will be in a meaningful rather than an intermediate state that makes sense to the application.

What we’ve done here has a deep similarity with optimising module loading or Make task execution: they all involve flattening a dependency graph. If you look at the first command trace for the delete() method, where reads and writes are interleaved, almost every command depends on the one before it. The HSET that updates a folder’s modified time depends on the HGETs that read its children’s modified times, which in turn depend on the SMEMBERS to find the children’s names, all of which depends on the list of children and their times being left up-to-date by the previous iteration. The only parallelisable commands in the whole list are the DEL/SREM pairs that delete items; the dependency graph of this trace is thirteen steps deep, including the initial version check.

          HGET current document version
                       |
     DEL document / SREM from parent's children
                       |
            SMEMBERS parent's children
                       |
      DEL folder / SREM from parent's children
                       |
            SMEMBERS parent's children
                       |
             HGET all child versions
                       |
               HSET parent version
                       |
            SMEMBERS parent's children
                       |
             HGET all child versions
                       |
               HSET parent version
                       |
            SMEMBERS parent's children
                       |
             HGET all child versions
                       |
               HSET parent version

By contrast, the final trace we ended up with is very shallow: all the writes are mutually independent, which is why we can put them in a transaction. All the HGETS are mutually independent, but they each depend on one of the SMEMBERS calls, though those calls are also mutually independent. Including the initial version check, this dependency graph is then only four steps deep compared to the thirteen of the previous trace.

          HGET current document version
                       |
   SMEMBERS the children of all parent folders
                       |
        HGET remaining children's versions
                       |
       DEL / SREM / HSET items as necessary

We’ve reshaped the graph so that more of its items are order-independent. For some problems, this lets us run tasks in parallel. For this task, it means operations can be grouped into transactions and made atomic.

Now you’re probably wondering what all this has to do with promises, or Make, or functional programming. Well, the interesting thing for me is what triggered me to do this work in the first place. I was actually working on the Redis backend for Faye, where each client has a set of channels it’s subscribed to (and each channel has a reverse index of clients subscribed to it, for routing messages), and a list of messages that are queued for delivery to that client. The engine periodically sweeps the database for expired sessions and removes all their subscriptions and message queues.

For a long time after I launched the Redis engine, there were problems with race conditions that would let expired sessions leak memory over time. For example, if we delete a client’s message queue before we delete its subscriptions, then there’s a window of time where a published message can still be routed to that client and added to its queue, after that queue has been deleted.

I found this problem really hard to solve for a long time – the Redis backend launched over 3 years ago – and it surprised me that all of the sudden the solution became obvious to me. Just like our PUT example above, the session expiry mechanism had a read right in the middle of it that looked up the client’s subscriptions so it could delete them all, actually by delegating to another function that just handled unsubscribing a client from a single channel. Thinking about the problem as a dependency graph made me realise that the read could be pushed to the start of the process, leaving a bunch of mutually independent writes that could all be put in a transaction. Et voilá, atomic client expiry, one less race condition to worry about.

I had been stuck because I was thinking about the problem from an imperative programming mindset. In imperative programming, or more precisely, in systems with strict rather than lazy evaluation, we are forced to tell the computer to perform actions in a certain order. Sometimes, the order we decide upon is important – the program would break if its statements were reordered – and sometimes it is not – a set of statements can be evaluated in any order and the program would still work. That is, in an imperative program we often see code like this:

f()
g()

But we are not sure whether it matters whether f() is called first or not. This ambiguity arises because the program does not express the dependency relationships between the entities it manipulates. Those relationships are in the programmer’s head or on a whiteboard somewhere, and they have done a bunch of computation outside of the program to determine what the program itself should look like. The program expresses the solution the programmer invented, rather than solving the problem itself, and crucial constraints on the design of the solution are omitted.

In a functional language like Haskell, where functions are lazy and have no side effects, the only way to force two expressions to run in a certain order is to make one depend on the other, either by taking its output as input:

g(f())

Or by making one expression conditional on the other:

if f() {
  g()
}

(These examples are generic pseudocode, not executable Haskell.)

In the above construction that simply calls f() and then g(), it’s not clear from the code itself whether f() can be called later or removed from the program entirely. The only reason it could be necessary to place it before g() is that it has some side effect that affects the result of g(). The only way to tell for sure is to run the program, possibly via a test harness, and check what its cumulative side effects are, for example checking the state it leaves a database in.

The latter two constructs express, via the syntax of the program, that g() depends on f() in ways that are far stronger than simply listing them one after the other. They make it clear that the result of the program will be very different if f() is removed; if its output is fed directly into g(), or used to decide whether to call g() at all, then we know it must affect g()’s output in some way. This dependency information is encoded in the program itself, and not enforced implicitly via a test suite or manual QA process.

Stuck in the imperative mindset, I was convinced the solution had to lie in the order I was running these Redis commands in, that if I could just find the right recipe and run commands at just the right times, the problem would be solved. But having spent the last year thinking about things like promises and Make, I suddenly saw the problem instead in terms of dependencies. Just as I have done in the examples above, I thought about the dependencies between values explicitly, then tried to see if I could rearrange that graph to bring all the writes together, and lo and behold the right answer just fell out.

The funny thing is, I’m not sure anyone would call my final code for DELETE a ‘functional’ program; it’s still very much stateful imperative code, full of side effects and mutable data structures. But I would argue that it’s more functional than the original DELETE code. In the original, the order of statements is very important, and there are few places where an explicit dependency is drawn between two operations by having values defined in terms of other values. Frequently, getting a correct value depends on some SREM or HSET side effect having been executed before its calculation, and that dependency is entirely encoded by the statements executing in a certain order rather than by relationships between values.

But in the second example, far more of the statements that are order-dependent are forced to be so by having values defined in terms of other values. The calculations for empty, remaining and mtimes depend explicitly on the value of children, thus forcing children to be calculated before them. Once we get to the write operations, those can be executed in any order and we can tell this because there is no write operation that uses the value of another write operation as its input. We can reorder the statements for the writes and the program will still work. Whereas, if we reorder the statements for the reads, we will get an ‘undefined variable’ error because a value they depend on is not available yet. We get direct feedback from the programming language that an operation makes no sense, rather than being able to freely reorder statements and having to check the database to see what went wrong. If we were writing this in a compiled language, this error could even be flagged by static analysis without running the program at all.

In other words, the program uses functional techniques to indicate and force the correct order of operations, rather than imperative ones, and makes it clear where dependencies exist and where they do not. Instead of using the database as the working space to incrementally figure out which changes to make, we’ve moved all that work into the program’s memory where we can derive values from one another and compute a single set of changes to make at the end.

Once you start thinking in terms of dependency graphs, you start seeing them everywhere. They’re a great tool for figuring out how you can optimise programs, and make them more explicit and resistant to people accidentally introducing sequencing bugs later. They encode the problem constraints directly in the program, and get you a step closer to letting the computer solve your problems for you.