Last month I announced a bunch of new features in Faye 1.1, including support for the permessage-deflate WebSocket extension. In this article I want to talk about how that support works, and what architectural changes have been introduced to accommodate it.
To begin with, we need to talk a bit about what permessage-deflate is, and how it fits into the WebSocket protocol. Briefly, it’s a protocol extension that compresses messages using the DEFLATE algorithm as they go over the wire, reducing the amount of data you need to transfer between the client and the server. Let’s take a look at how it works on the wire.
Say we’ve set up a WebSocket server that echoes everything it receives from the client, using faye-websocket:
var http = require('http'),
WebSocket = require('faye-websocket');
var server = http.createServer();
server.on('upgrade', function(request, socket, body) {
var ws = new WebSocket(request, socket, body);
ws.pipe(ws);
});
server.listen(8000);
And, we have a client that connects to the server and sends it one message:
var WebSocket = require('faye-websocket').Client;
var ws = new WebSocket('ws://localhost:8000/chat');
ws.onopen = function() {
ws.send('yeah yeah yeah');
};
Now, if you’re writing a WebSocket application, you can deploy this code and not concern yourself at all with what’s being sent over the wire. Indeed, in recent browsers, the permessage-deflate extension is implicitly activated for you with no further intervention. But to understand where this article is going, it helps to understand some of the wire details.
When you call new WebSocket('ws://localhost:8000/chat')
, the WebSocket opens a
TCP socket to port 8000 on localhost
, and then sends this data over the
socket:
GET /chat HTTP/1.1
Host: localhost:8000
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: iHm5Megd8ejRpeQOGZM0RA==
Sec-WebSocket-Version: 13
This is essentially a special HTTP GET
request that tells the server we want
to tunnel the websocket
protocol over the connection. Sec-WebSocket-Key
is a
random 16-byte number expressed in base64. (The client may send other HTTP
headers as part of this request.)
The server sends back this response, and both peers then leave the TCP connection open.
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: hJdhaqdF54rb/oSa2ZmdSvfZ4/I=
Sec-WebSocket-Accept
is a hash of the client’s Sec-WebSocket-Key
combined
with a GUID, which proves the server understands the WebSocket protocol.
After these handshakes have been exchanged, the client in our example wants to
send the message yeah yeah yeah
to the server, and does so by sending these
bytes:
81 8E 89 92 25 82 F0 F7 44 EA A9 EB 40 E3 E1 B2 5C E7 E8 FA
\----+----/ y e a h y e a h y e a h
|
mask
length = (0x8E & 0x7F) = 14
This is a WebSocket ‘frame’. The first two bytes contain header information like the type of the frame, how long it is, and some other flags. The third to sixth bytes are a ‘mask’, a cryptographically secure pseudorandom 32-bit number. The payload after the mask is the UTF-8 encoding of the message text, XORed with the mask bytes. (Masking is done to prevent JavaScript applications inserting crafted byte sequences into server input.)
The length is given by the lower-order seven bits of the second byte, which in
this case gives 14
.
The server echoes this message back, but servers are not required to mask their frames, and so the server’s frame is four bytes shorter than the client’s and contains the literal UTF-8 text of the message. Again, the frame length is given by the lower seven bits of the second byte.
81 0E 79 65 61 68 20 79 65 61 68 20 79 65 61 68
y e a h y e a h y e a h
length = (0x0E & 0x7F) = 14
Now, suppose we add the permessage-deflate
extension to the server and the
client. In faye-websocket
we do this by passing an extensions
option to the
WebSocket
constructor:
var deflate = require('permessage-deflate');
server.on('upgrade', function(request, socket, body) {
var ws = new WebSocket(request, socket, body, [], {extensions: [deflate]});
// ...
});
We do the same on the client side. In the browser, this step is not necessary since the browser activates permessage-deflate by default.
var deflate = require('permessage-deflate');
var ws = new WebSocket('ws://localhost:8000/chat', [], {extensions: [deflate]});
// ...
Now let’s look again at what goes over the wire. The client sends its handshake:
GET /chat HTTP/1.1
Host: localhost:8000
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: Vl75gUXJSfQo8sTwkmt4bA==
Sec-WebSocket-Version: 13
Sec-WebSocket-Extensions: permessage-deflate; client_max_window_bits
We see a new header Sec-WebSocket-Extensions
present. This example advertises
that the client wants to use the permessage-deflate
extension, and supports
the client_max_window_bits
parameter in its configuration. (The client and the
server may exchange parameters to configure each extension they use, but I won’t
go into that for now.)
If the server also supports that extension and wishes to activate it, it
includes its own Sec-WebSocket-Extensions
header, confirming as much:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: ZUJL5me7TGtKLrGYyLj2QDTKL1k=
Sec-WebSocket-Extensions: permessage-deflate
Now, when the client sends yeah yeah yeah
to the server, the frame still has
the mask and payload structure as before, but the payload is shorter by four
bytes:
C1 8A 4B 1E B8 72 E1 52 F5 BE 1B B6 3C 63 4B 1E
\----+----/ \-------------+-------------/
| |
mask payload
length = (0x8A & 0x7F) = 10
The server’s echo message is similarly shorter:
C1 0A AA 4C 4D CC 50 A8 84 11 00 00
length = (0x0A & 0x7F) = 10
You may also have noticed that the first byte on these frames is C1
, where it
was 81
before. Here’s what those numbers look like in bits:
0x81.toString(2) = '10000001'
0xC1.toString(2) = '11000001'
Note the extra 1
in the second value. That bit is a flag, called RSV1
, that
tells you that the current frame contains compressed data. There are two other
bit fields, RSV2
and RSV3
, that are reserved for use by other extensions yet
to be defined.
When a WebSocket receives a frame with the RSV1
bit set, it unmasks the frame
if necessary, then decompresses it, then interprets the result as UTF-8. When it
sends a frame, it compresses the UTF-8 sequence, then masks the message if
necessary, and it sets the RSV1
bit. Masking is done after compression, since
layering random noise over a message tends to make it less compressible.
The compression in this example might not look like a lot, but in practice it
can make a big difference. Because the client and server may reuse their DEFLATE
context between messages, the compression improves as more similar messages are
sent over the connection. On Faye /meta/connect
messages, compression
reduces the frame size from 118 bytes to 14 bytes, a saving of 88%.
So we have seen that the use of a WebSocket extension involves various concerns:
- The client and server must negotiate which extensions to activate, and with
which parameters, via the
Sec-WebSocket-Extensions
header. - Messages must be transformed in some way as they are sent and received by each peer, for example compressing on transmission and decompressing on receipt.
- The
RSV
bits must be correctly set and interpreted such that the right transformations are applied to each message.
There also various other considerations I’ve not explored here, for example:
- The
Sec-WebSocket-Extensions
header must be parsed and serialized according to an ABNF grammar that refers to grammars defined in HTTP. - Either side of the connection must fail the connection on receipt of an invalid extensions header.
- It is illegal to activate two extensions that both use the same
RSV
bit. - Transformations must be applied to messages in the order given in the
server’s
Sec-WebSocket-Extensions
header. - Because extensions might be stateful, for example permessage-deflate retains a DEFLATE context between messages, ordering of messages must be preserved between the wire, the extension pipeline, and the application.
Some of these concerns relate specifically to the permessage-deflate extension in and of itself, and some relate to all extensions, and how they are combined with one another. When implementing this new functionality, I tried to keep such concerns separated. I wanted to keep logic specific to permessage-deflate out of the rest of the protocol implementation, and instead write it as a plugin. Creating an architecture to support this means people don’t need my blessing (or my finite time resources) to add new extensions to websocket-driver: they can write the extension themselves and drop it in. I also want such plugins to be as simple to write as possible; any concerns that relate to all extensions do not belong in individual extension codebases. And finally, I want those plugins to be portable. There are many different WebSocket libraries around for Node and Ruby, and I would like it if users of any of those libraries could adopt any extension that’s written, rather than having to rewrite it for their stack.
Together, these aims also facilitate the design of new extension standards, by allowing people to implement and deploy their ideas without the blessing of their WebSocket library vendor, to gather design feedback through usage.
To support these aims, I have introduced a small new framework into the Faye
WebSocket ecosystem: websocket-extensions
, for Node and Ruby. Its
job is to:
- Parse and serialize the
Sec-WebSocket-Extensions
header so that plugins can deal with parameters as data structures rather than strings - Enforce the rules governing activation of extensions, for example avoiding
extensions with conflicting
RSV
usage - Handle passing messages through the extension pipeline, including preservation of message order between potentially asynchronous transformations
- Cleaning up extensions as the connection closes, and notifying the driver when it’s safe to send its closing frame
It defines an abstract API for plugins to implement, and an abstract data
structure to represent messages, so that drivers and extensions can all
interoperate. This is analogous to how Rack defines an abstract API for
applications and middleware (the call(env)
method) and abstract data
structures for HTTP requests and responses, to allow app servers and
applications to interoperate.
This might make a little more sense if I explain how it works in the Faye stack.
If you npm install faye permessage-deflate
right now, you will get this
dependency tree:
o
├─┬ faye
│ └─┬ faye-websocket
│ └─┬ websocket-driver
│ └── websocket-extensions
└── permessage-deflate
When you create your Faye server, you load permessage-deflate
and add it to
your server as an extension. Notice how this code does not interact with the
deflate
object in any way, it just passes it as a value to another component.
// app.js
var faye = require('faye'),
deflate = require('permessage-deflate');
var server = new faye.NodeAdapter({mount: '/'});
server.addWebsocketExtension(deflate);
Inside the Faye server, when it accepts a WebSocket connection using
faye-websocket
, it passes deflate
in via the extensions
option. Again, we
do not interact with deflate
at this level.
// faye
var WebSocket = require('faye-websocket');
var options = {extensions: [deflate]},
ws = new WebSocket(request, socket, body, [], options);
Now, faye-websocket
is responsible for managing the socket’s TCP connection
and wrapping it in the standard WebSocket API, but it delegates all
handling of the protocol to websocket-driver
. When it creates a driver to
manage the socket’s data flow, it hands deflate
off to
driver.addExtension()
, without interacting with it.
// faye-websocket
var Driver = require('websocket-driver');
var driver = Driver.http(request);
driver.addExtension(deflate);
websocket-driver
doesn’t interact with deflate
directly either, it uses
websocket-extensions
to manage its registered extensions, and it adds
deflate
to the set of extensions for the current connection.
// websocket-driver
var Extensions = require('websocket-extensions');
var exts = new Extensions();
exts.add(deflate);
Here is where meaningful things start to happen. In order to generate the
server’s handshake response, websocket-driver
tells websocket-extensions
to
generate a response Sec-WebSocket-Extensions
header, based on the request’s
header of the same name.
// websocket-driver
// header = 'permessage-deflate; client_max_window_bits'
var header = headers['sec-websocket-extensions'],
response = exts.generateResponse(header);
handshake['Sec-WebSocket-Extensions'] = response;
Internally, websocket-extensions
parses the header, determines which
extensions should be activated, and initialises each of them with the header
parameters. This is where we begin interacting with deflate
.
// websocket-extensions
var response = [],
sessions = [];
var session = deflate.createServerSession([{client_max_window_bits: true}]);
response.push(serialize(deflate.name, session.generateResponse()));
sessions.push(session);
This process generates a Sec-WebSocket-Extensions
header to send back to the
client, and creates a ‘session’ from each extension that will handle messages
during the current connection. When websocket-driver
receives a message from
the wire, it parses the byte sequence as described above and produces this data
structure that represents the message. This is the standard abstraction that
drivers and extensions use to interoperate. It hands this structure off to its
Extensions
instance to be processed, and receives the result - a message
structure with an uncompressed data
field.
// websocket-driver
var message = {
rsv1: true,
rsv2: false,
rsv3: false,
opcode: 1,
data: <Buffer aa 4c 4d cc 50 a8 84 11 00 00>
};
exts.processIncomingMessage(message, function(error, result) {
// result.data = <Buffer 79 65 61 68 20 79 65 61 68 20 79 65 61 68>
// = y e a h y e a h y e a h
});
exts.processIncomingMessage()
is an API offered by websocket-extensions
. It
takes a message and passes it through each extension session in turn, returning
the final result to the caller. Below is a simplified version of this process
using async.reduce()
; what actually goes on is more complicated due to
things like message order preservation, but this is conceptually what’s being
done:
// websocket-extensions
// sessions = [deflate.createServerSession()]
async.reduce(sessions, message, function(msg, session, callback) {
session.processIncomingMessage(msg, callback);
}, function(error, result) {
// result.data = <Buffer 79 65 61 68 20 79 65 61 68 20 79 65 61 68>
});
What the permessage-deflate
session in particular does with this message is to
decompress it using zlib.createInflateRaw()
. The extra 00 00 FF FF
chunk
marks the end of a DEFLATE block and is omitted from messages on the wire.
// permessage-deflate
var zlib = require('zlib');
var inflate = zlib.createInflateRaw();
inflate.write(message.data); // <Buffer aa 4c 4d cc 50 a8 84 11 00 00>
inflate.write(new Buffer([0x00, 0x00, 0xFF, 0xFF]));
inflate.on('readable', function() {
message.data = inflate.read(); // <Buffer 79 65 61 68 20 79 65 61 68 20 79 65 61 68>
});
This chain of events might seem a little convoluted, but notice the following:
- No information about how
permessage-deflate
works, either its handshake parameters or its message transformation semantics, are encoded outside of thepermessage-deflate
module. - The
permessage-deflate
module only deals with (de)compressing messages not with any other protocol or concurrent processing concerns. - Nothing outside of
websocket-extensions
interacts directly with thepermessage-deflate
module; it is treated as an opaque value. Not evenwebsocket-extensions
knows howpermessage-deflate
works, it only cares that it implements an abstract interface for negotiating parameters and transforming messages. - Although
websocket-extensions
interacts withpermessage-deflate
, it does not depend on it specifically. It depends on an abstraction (the plugin API), and the particular extension is injected from outside.
This makes for a highly modular system in which a WebSocket protocol
implementation, and anything built on top of it, can make use of an extension
plugin without knowing anything about how that extension in particular or the
negotiation procedure in general works. It doesn’t even have to interact with
the extension API at all; all interaction is mediated by websocket-extensions
.
Likewise the extension plugin knows nothing about the driver, about other
extensions or about how they combine; it doesn’t even need to know the syntax of
the Sec-WebSocket-Extensions
header. All it cares about is that it should be
given values of the abstract type:
type Message = {
rsv1 :: Boolean
rsv2 :: Boolean
rsv3 :: Boolean
opcode :: Number
data :: Buffer
}
Today, any system using faye
, faye-websocket
or websocket-driver
can opt
into using permessage-deflate
and extend this pluggable interface downstream.
But I’d also like to see other WebSocket stacks adopt websocket-extensions
and
write plugins for it, so we can share the workload of implementing extensions.
I’ll close with one final observation. Recall the dependency tree I displayed above:
o
├─┬ faye
│ └─┬ faye-websocket
│ └─┬ websocket-driver
│ └── websocket-extensions
└── permessage-deflate
From the application’s point of view, there are three levels of indirection between the place where it passes the extension off to Faye, and the module that actually interacts with this extension. Isn’t passing an object through this many abstraction boundaries a violation of the Law of Demeter?
Although this thought did initially occur to me, I don’t believe this
constitutes a formal violation: nothing above websocket-extensions
interacts
with deflate
in any way and so cannot be said to be coupled either to the
deflate
object itself, or to its class or type. (I will interpret type
as being the abstract extension interface defined by websocket-extensions
.)
However, many would interpret this structure as reaching across too many
abstraction boundaries, and therefore violating the spirit of the law. After
some thought I also believe this is false, and here’s why.
Consider what would happen if permessage-deflate
were built-in functionality
of websocket-driver
, as has been done by other WebSocket libraries. It would
presumably have some interface for configuring the extension, for example:
var driver = Driver.http(request, {perMessageDeflate: true});
// - or -
var driver = Driver.http(request, {perMessageDeflate: {serverMaxWindowBits: 8}});
To expose this functionality to its callers faye-websocket
would similarly
need to expose a config API:
var ws = new WebSocket(request, socket, body, [], {perMessageDeflate: true});
And Faye would need to follow suit:
var server = new faye.NodeAdapter({mount: '/', {websocket: {perMessageDeflate: true}}});
Now we have a problem. Everything that builds on top of websocket-driver
has
to include code that couples to particular features of it: it has to know that
there is such a thing as permessage-deflate
and provide an interface for
configuring it. If each layer wants to avoid such coupling, it could take the
options object from the caller and hand it directly to the layer underneath, but
that would mean exposing the config API of websocket-driver
all the way up the
stack, which is a clear Demeter violation. So most people would opt for the
approach of providing an explicit interface and delegating to the layer
underneath, so that each layer is only coupled to the interface of its immediate
neighbour.
But then what happens when a new extension is invented? Should
websocket-driver
, faye-websocket
, faye
, and anything built on top of them
have to release a new version to expose it? As of this writing, those modules
have 115 immediate dependents between them in npm, accounting for millions of
downloads per month. That’s an awful lot of downstream code to update! Whereas,
with the websocket-extensions
architecture, as long as downstream modules
expose an API for adding extensions, then a new extension can be deployed
without any intervening modules needing to change. This must surely mean that
this architecture reduces coupling, and it completely removes me as a bottleneck
for people getting new extensions into their Faye stack.
Encoding things as values rather than as names, as symbols in an API, is what
enables this. Since permessage-deflate
is a value, it can carry its own
configuration rather than other modules having to configure it indirectly:
var WebSocket = require('faye-websocket').Client,
deflate = require('permessage-deflate'),
zlib = require('zlib');
deflate = deflate.configure({
maxWindowBits: 8,
strategy: zlib.Z_HUFFMAN_ONLY
});
var ws = new WebSocket(url, [], {extensions: [deflate]});
The ‘configured’ deflate
object still responds to the same API as the
original, it just carries different parameters around inside of itself.
To me, this is a good illustration of the power of well-defined interfaces and
their influence on modularity and interoperability. It illustrates why npm’s
peerDependencies
is a flawed idea: it lets you ensure that a plugin and its
host system are compatible if they are siblings, but what if the host framework
is further down the tree? I don’t think attempting to solve that problem via
versioning is a workable idea, and I would much rather solve it with
well-defined stable seams between modules.
In short, a stable platform is an extensible one.