WebSocket extensions as plugins

Last month I announced a bunch of new features in Faye 1.1, including support for the permessage-deflate WebSocket extension. In this article I want to talk about how that support works, and what architectural changes have been introduced to accommodate it.

To begin with, we need to talk a bit about what permessage-deflate is, and how it fits into the WebSocket protocol. Briefly, it’s a protocol extension that compresses messages using the DEFLATE algorithm as they go over the wire, reducing the amount of data you need to transfer between the client and the server. Let’s take a look at how it works on the wire.

Say we’ve set up a WebSocket server that echoes everything it receives from the client, using faye-websocket:

var http      = require('http'),
    WebSocket = require('faye-websocket');

var server = http.createServer();

server.on('upgrade', function(request, socket, body) {
  var ws = new WebSocket(request, socket, body);
  ws.pipe(ws);
});

server.listen(8000);

And, we have a client that connects to the server and sends it one message:

var WebSocket = require('faye-websocket').Client;

var ws = new WebSocket('ws://localhost:8000/chat');

ws.onopen = function() {
  ws.send('yeah yeah yeah');
};

Now, if you’re writing a WebSocket application, you can deploy this code and not concern yourself at all with what’s being sent over the wire. Indeed, in recent browsers, the permessage-deflate extension is implicitly activated for you with no further intervention. But to understand where this article is going, it helps to understand some of the wire details.

When you call new WebSocket('ws://localhost:8000/chat'), the WebSocket opens a TCP socket to port 8000 on localhost, and then sends this data over the socket:

GET /chat HTTP/1.1
Host: localhost:8000
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: iHm5Megd8ejRpeQOGZM0RA==
Sec-WebSocket-Version: 13

This is essentially a special HTTP GET request that tells the server we want to tunnel the websocket protocol over the connection. Sec-WebSocket-Key is a random 16-byte number expressed in base64. (The client may send other HTTP headers as part of this request.)

The server sends back this response, and both peers then leave the TCP connection open.

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: hJdhaqdF54rb/oSa2ZmdSvfZ4/I=

Sec-WebSocket-Accept is a hash of the client’s Sec-WebSocket-Key combined with a GUID, which proves the server understands the WebSocket protocol.

After these handshakes have been exchanged, the client in our example wants to send the message yeah yeah yeah to the server, and does so by sending these bytes:

81 8E 89 92 25 82 F0 F7 44 EA A9 EB 40 E3 E1 B2 5C E7 E8 FA
      \----+----/  y  e  a  h     y  e  a  h     y  e  a  h
           |
          mask

length = (0x8E & 0x7F) = 14

This is a WebSocket ‘frame’. The first two bytes contain header information like the type of the frame, how long it is, and some other flags. The third to sixth bytes are a ‘mask’, a cryptographically secure pseudorandom 32-bit number. The payload after the mask is the UTF-8 encoding of the message text, XORed with the mask bytes. (Masking is done to prevent JavaScript applications inserting crafted byte sequences into server input.)

The length is given by the lower-order seven bits of the second byte, which in this case gives 14.

The server echoes this message back, but servers are not required to mask their frames, and so the server’s frame is four bytes shorter than the client’s and contains the literal UTF-8 text of the message. Again, the frame length is given by the lower seven bits of the second byte.

81 0E 79 65 61 68 20 79 65 61 68 20 79 65 61 68
       y  e  a  h     y  e  a  h     y  e  a  h

length = (0x0E & 0x7F) = 14

Now, suppose we add the permessage-deflate extension to the server and the client. In faye-websocket we do this by passing an extensions option to the WebSocket constructor:

var deflate = require('permessage-deflate');

server.on('upgrade', function(request, socket, body) {
  var ws = new WebSocket(request, socket, body, [], {extensions: [deflate]});
  // ...
});

We do the same on the client side. In the browser, this step is not necessary since the browser activates permessage-deflate by default.

var deflate = require('permessage-deflate');

var ws = new WebSocket('ws://localhost:8000/chat', [], {extensions: [deflate]});
// ...

Now let’s look again at what goes over the wire. The client sends its handshake:

GET /chat HTTP/1.1
Host: localhost:8000
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: Vl75gUXJSfQo8sTwkmt4bA==
Sec-WebSocket-Version: 13
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits

We see a new header Sec-WebSocket-Extensions present. This example advertises that the client wants to use the permessage-deflate extension, and supports the client_max_window_bits parameter in its configuration. (The client and the server may exchange parameters to configure each extension they use, but I won’t go into that for now.)

If the server also supports that extension and wishes to activate it, it includes its own Sec-WebSocket-Extensions header, confirming as much:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: ZUJL5me7TGtKLrGYyLj2QDTKL1k=
Sec-WebSocket-Extensions: permessage-deflate

Now, when the client sends yeah yeah yeah to the server, the frame still has the mask and payload structure as before, but the payload is shorter by four bytes:

C1 8A 4B 1E B8 72 E1 52 F5 BE 1B B6 3C 63 4B 1E
      \----+----/ \-------------+-------------/
           |                    |
          mask                payload

length = (0x8A & 0x7F) = 10

The server’s echo message is similarly shorter:

C1 0A AA 4C 4D CC 50 A8 84 11 00 00

length = (0x0A & 0x7F) = 10

You may also have noticed that the first byte on these frames is C1, where it was 81 before. Here’s what those numbers look like in bits:

0x81.toString(2) = '10000001'
0xC1.toString(2) = '11000001'

Note the extra 1 in the second value. That bit is a flag, called RSV1, that tells you that the current frame contains compressed data. There are two other bit fields, RSV2 and RSV3, that are reserved for use by other extensions yet to be defined.

When a WebSocket receives a frame with the RSV1 bit set, it unmasks the frame if necessary, then decompresses it, then interprets the result as UTF-8. When it sends a frame, it compresses the UTF-8 sequence, then masks the message if necessary, and it sets the RSV1 bit. Masking is done after compression, since layering random noise over a message tends to make it less compressible.

The compression in this example might not look like a lot, but in practice it can make a big difference. Because the client and server may reuse their DEFLATE context between messages, the compression improves as more similar messages are sent over the connection. On Faye /meta/connect messages, compression reduces the frame size from 118 bytes to 14 bytes, a saving of 88%.

So we have seen that the use of a WebSocket extension involves various concerns:

  • The client and server must negotiate which extensions to activate, and with which parameters, via the Sec-WebSocket-Extensions header.
  • Messages must be transformed in some way as they are sent and received by each peer, for example compressing on transmission and decompressing on receipt.
  • The RSV bits must be correctly set and interpreted such that the right transformations are applied to each message.

There also various other considerations I’ve not explored here, for example:

  • The Sec-WebSocket-Extensions header must be parsed and serialized according to an ABNF grammar that refers to grammars defined in HTTP.
  • Either side of the connection must fail the connection on receipt of an invalid extensions header.
  • It is illegal to activate two extensions that both use the same RSV bit.
  • Transformations must be applied to messages in the order given in the server’s Sec-WebSocket-Extensions header.
  • Because extensions might be stateful, for example permessage-deflate retains a DEFLATE context between messages, ordering of messages must be preserved between the wire, the extension pipeline, and the application.

Some of these concerns relate specifically to the permessage-deflate extension in and of itself, and some relate to all extensions, and how they are combined with one another. When implementing this new functionality, I tried to keep such concerns separated. I wanted to keep logic specific to permessage-deflate out of the rest of the protocol implementation, and instead write it as a plugin. Creating an architecture to support this means people don’t need my blessing (or my finite time resources) to add new extensions to websocket-driver: they can write the extension themselves and drop it in. I also want such plugins to be as simple to write as possible; any concerns that relate to all extensions do not belong in individual extension codebases. And finally, I want those plugins to be portable. There are many different WebSocket libraries around for Node and Ruby, and I would like it if users of any of those libraries could adopt any extension that’s written, rather than having to rewrite it for their stack.

Together, these aims also facilitate the design of new extension standards, by allowing people to implement and deploy their ideas without the blessing of their WebSocket library vendor, to gather design feedback through usage.

To support these aims, I have introduced a small new framework into the Faye WebSocket ecosystem: websocket-extensions, for Node and Ruby. Its job is to:

  • Parse and serialize the Sec-WebSocket-Extensions header so that plugins can deal with parameters as data structures rather than strings
  • Enforce the rules governing activation of extensions, for example avoiding extensions with conflicting RSV usage
  • Handle passing messages through the extension pipeline, including preservation of message order between potentially asynchronous transformations
  • Cleaning up extensions as the connection closes, and notifying the driver when it’s safe to send its closing frame

It defines an abstract API for plugins to implement, and an abstract data structure to represent messages, so that drivers and extensions can all interoperate. This is analogous to how Rack defines an abstract API for applications and middleware (the call(env) method) and abstract data structures for HTTP requests and responses, to allow app servers and applications to interoperate.

This might make a little more sense if I explain how it works in the Faye stack. If you npm install faye permessage-deflate right now, you will get this dependency tree:

o
├─┬ faye
│ └─┬ faye-websocket
│   └─┬ websocket-driver
│     └── websocket-extensions
└── permessage-deflate

When you create your Faye server, you load permessage-deflate and add it to your server as an extension. Notice how this code does not interact with the deflate object in any way, it just passes it as a value to another component.

// app.js

var faye    = require('faye'),
    deflate = require('permessage-deflate');

var server = new faye.NodeAdapter({mount: '/'});
server.addWebsocketExtension(deflate);

Inside the Faye server, when it accepts a WebSocket connection using faye-websocket, it passes deflate in via the extensions option. Again, we do not interact with deflate at this level.

// faye

var WebSocket = require('faye-websocket');

var options = {extensions: [deflate]},
    ws      = new WebSocket(request, socket, body, [], options);

Now, faye-websocket is responsible for managing the socket’s TCP connection and wrapping it in the standard WebSocket API, but it delegates all handling of the protocol to websocket-driver. When it creates a driver to manage the socket’s data flow, it hands deflate off to driver.addExtension(), without interacting with it.

// faye-websocket

var Driver = require('websocket-driver');

var driver = Driver.http(request);
driver.addExtension(deflate);

websocket-driver doesn’t interact with deflate directly either, it uses websocket-extensions to manage its registered extensions, and it adds deflate to the set of extensions for the current connection.

// websocket-driver

var Extensions = require('websocket-extensions');

var exts = new Extensions();
exts.add(deflate);

Here is where meaningful things start to happen. In order to generate the server’s handshake response, websocket-driver tells websocket-extensions to generate a response Sec-WebSocket-Extensions header, based on the request’s header of the same name.

// websocket-driver

// header = 'permessage-deflate; client_max_window_bits'

var header   = headers['sec-websocket-extensions'],
    response = exts.generateResponse(header);

handshake['Sec-WebSocket-Extensions'] = response;

Internally, websocket-extensions parses the header, determines which extensions should be activated, and initialises each of them with the header parameters. This is where we begin interacting with deflate.

// websocket-extensions

var response = [],
    sessions = [];

var session = deflate.createServerSession([{client_max_window_bits: true}]);
response.push(serialize(deflate.name, session.generateResponse()));
sessions.push(session);

This process generates a Sec-WebSocket-Extensions header to send back to the client, and creates a ‘session’ from each extension that will handle messages during the current connection. When websocket-driver receives a message from the wire, it parses the byte sequence as described above and produces this data structure that represents the message. This is the standard abstraction that drivers and extensions use to interoperate. It hands this structure off to its Extensions instance to be processed, and receives the result - a message structure with an uncompressed data field.

// websocket-driver

var message = {
  rsv1:   true,
  rsv2:   false,
  rsv3:   false,
  opcode: 1,
  data:   <Buffer aa 4c 4d cc 50 a8 84 11 00 00>
};

exts.processIncomingMessage(message, function(error, result) {
  // result.data = <Buffer 79 65 61 68 20 79 65 61 68 20 79 65 61 68>
  //             =          y  e  a  h     y  e  a  h     y  e  a  h
});

exts.processIncomingMessage() is an API offered by websocket-extensions. It takes a message and passes it through each extension session in turn, returning the final result to the caller. Below is a simplified version of this process using async.reduce(); what actually goes on is more complicated due to things like message order preservation, but this is conceptually what’s being done:

// websocket-extensions

// sessions = [deflate.createServerSession()]

async.reduce(sessions, message, function(msg, session, callback) {
  session.processIncomingMessage(msg, callback);
}, function(error, result) {
  // result.data = <Buffer 79 65 61 68 20 79 65 61 68 20 79 65 61 68>
});

What the permessage-deflate session in particular does with this message is to decompress it using zlib.createInflateRaw(). The extra 00 00 FF FF chunk marks the end of a DEFLATE block and is omitted from messages on the wire.

// permessage-deflate

var zlib = require('zlib');

var inflate = zlib.createInflateRaw();
inflate.write(message.data); // <Buffer aa 4c 4d cc 50 a8 84 11 00 00>
inflate.write(new Buffer([0x00, 0x00, 0xFF, 0xFF]));

inflate.on('readable', function() {
  message.data = inflate.read(); // <Buffer 79 65 61 68 20 79 65 61 68 20 79 65 61 68>
});

This chain of events might seem a little convoluted, but notice the following:

  • No information about how permessage-deflate works, either its handshake parameters or its message transformation semantics, are encoded outside of the permessage-deflate module.
  • The permessage-deflate module only deals with (de)compressing messages not with any other protocol or concurrent processing concerns.
  • Nothing outside of websocket-extensions interacts directly with the permessage-deflate module; it is treated as an opaque value. Not even websocket-extensions knows how permessage-deflate works, it only cares that it implements an abstract interface for negotiating parameters and transforming messages.
  • Although websocket-extensions interacts with permessage-deflate, it does not depend on it specifically. It depends on an abstraction (the plugin API), and the particular extension is injected from outside.

This makes for a highly modular system in which a WebSocket protocol implementation, and anything built on top of it, can make use of an extension plugin without knowing anything about how that extension in particular or the negotiation procedure in general works. It doesn’t even have to interact with the extension API at all; all interaction is mediated by websocket-extensions.

Likewise the extension plugin knows nothing about the driver, about other extensions or about how they combine; it doesn’t even need to know the syntax of the Sec-WebSocket-Extensions header. All it cares about is that it should be given values of the abstract type:

type Message = {
  rsv1   :: Boolean
  rsv2   :: Boolean
  rsv3   :: Boolean
  opcode :: Number
  data   :: Buffer
}

Today, any system using faye, faye-websocket or websocket-driver can opt into using permessage-deflate and extend this pluggable interface downstream. But I’d also like to see other WebSocket stacks adopt websocket-extensions and write plugins for it, so we can share the workload of implementing extensions.

I’ll close with one final observation. Recall the dependency tree I displayed above:

o
├─┬ faye
│ └─┬ faye-websocket
│   └─┬ websocket-driver
│     └── websocket-extensions
└── permessage-deflate

From the application’s point of view, there are three levels of indirection between the place where it passes the extension off to Faye, and the module that actually interacts with this extension. Isn’t passing an object through this many abstraction boundaries a violation of the Law of Demeter?

Although this thought did initially occur to me, I don’t believe this constitutes a formal violation: nothing above websocket-extensions interacts with deflate in any way and so cannot be said to be coupled either to the deflate object itself, or to its class or type. (I will interpret type as being the abstract extension interface defined by websocket-extensions.) However, many would interpret this structure as reaching across too many abstraction boundaries, and therefore violating the spirit of the law. After some thought I also believe this is false, and here’s why.

Consider what would happen if permessage-deflate were built-in functionality of websocket-driver, as has been done by other WebSocket libraries. It would presumably have some interface for configuring the extension, for example:

var driver = Driver.http(request, {perMessageDeflate: true});

// - or -

var driver = Driver.http(request, {perMessageDeflate: {serverMaxWindowBits: 8}});

To expose this functionality to its callers faye-websocket would similarly need to expose a config API:

var ws = new WebSocket(request, socket, body, [], {perMessageDeflate: true});

And Faye would need to follow suit:

var server = new faye.NodeAdapter({mount: '/', {websocket: {perMessageDeflate: true}}});

Now we have a problem. Everything that builds on top of websocket-driver has to include code that couples to particular features of it: it has to know that there is such a thing as permessage-deflate and provide an interface for configuring it. If each layer wants to avoid such coupling, it could take the options object from the caller and hand it directly to the layer underneath, but that would mean exposing the config API of websocket-driver all the way up the stack, which is a clear Demeter violation. So most people would opt for the approach of providing an explicit interface and delegating to the layer underneath, so that each layer is only coupled to the interface of its immediate neighbour.

But then what happens when a new extension is invented? Should websocket-driver, faye-websocket, faye, and anything built on top of them have to release a new version to expose it? As of this writing, those modules have 115 immediate dependents between them in npm, accounting for millions of downloads per month. That’s an awful lot of downstream code to update! Whereas, with the websocket-extensions architecture, as long as downstream modules expose an API for adding extensions, then a new extension can be deployed without any intervening modules needing to change. This must surely mean that this architecture reduces coupling, and it completely removes me as a bottleneck for people getting new extensions into their Faye stack.

Encoding things as values rather than as names, as symbols in an API, is what enables this. Since permessage-deflate is a value, it can carry its own configuration rather than other modules having to configure it indirectly:

var WebSocket = require('faye-websocket').Client,
    deflate   = require('permessage-deflate'),
    zlib      = require('zlib');

deflate = deflate.configure({
  maxWindowBits: 8,
  strategy: zlib.Z_HUFFMAN_ONLY
});

var ws = new WebSocket(url, [], {extensions: [deflate]});

The ‘configured’ deflate object still responds to the same API as the original, it just carries different parameters around inside of itself.

To me, this is a good illustration of the power of well-defined interfaces and their influence on modularity and interoperability. It illustrates why npm’s peerDependencies is a flawed idea: it lets you ensure that a plugin and its host system are compatible if they are siblings, but what if the host framework is further down the tree? I don’t think attempting to solve that problem via versioning is a workable idea, and I would much rather solve it with well-defined stable seams between modules.

In short, a stable platform is an extensible one.