Faye 0.8.4: more efficient socket connections

Update: Version 0.8.5 was released shortly afterward to fix a URL parsing bug in this release.

I just released Faye 0.8.4, a drop-in replacement for previous releases. It includes various little fixes, including working around iOS’s new POST-caching bug, making sure JSON-P requests don’t exceed URL size limits, checking EventSource actually works to detect broken releases of Opera, and fixing relative URL resolution in Internet Explorer. But the biggest improvement is in how it negotiates which transport to use. TL;DR: it now makes half as many connections to the server to establish a WebSocket connection. Read on for more detail.

One responsibility of the Bayeux protocol on which Faye is based is figuring out which transport type to use between the client and the server. It does this using an upgrading process as follows. To initiate a connection, the client uses a vanilla HTTP request to send a message on the /meta/handshake channel; in the browser this is done using XMLHttpRequest for same-origin requests and JSON-P otherwise. The server’s response to this message includes two important things: a randomly generated client ID, and a list of transport types supported by the server.

$ curl -X POST http://localhost/bayeux /
    -H 'Content-Type: application/json' \
    -d '{"channel": "/meta/handshake", "supportedConnectionTypes": ["long-polling", "websocket"], "version": "1.0"}'

{
  "channel":    "/meta/handshake",
  "successful": true,
  "version":    "1.0",
  "clientId":   "irta1b0nh93z90baok4b1gbcw3p1emmcj50",
  "supportedConnectionTypes":[
    "long-polling",
    "cross-origin-long-polling",
    "callback-polling",
    "websocket",
    "eventsource",
    "in-process"
  ]
}

Once it knows what the server supports, the client can pick a new transport. But, it can’t just take this list at face value: even if the server supports WebSockets and the client has a WebSocket object available, the intervening network, proxies and so on may break the connection. So, the client needs to begin trying connections to find out which transports actually work before upgrading from the vanilla HTTP transport it is using. This testing is asynchronous, so in the meantime the client continues to use the vanilla transport.

In practise this results in the following sequence of events:

  1. Select a vanilla transport, either long-polling (XHR) or callback-polling (JSON-P)
  2. Send the /meta/handshake message to the server
  3. Receive a response, store client ID and list of supported transports
  4. Begin testing WebSocket and EventSource connections in the background
  5. Begin sending publish and subscribe messages using the original transport
  6. Eventually step 4 completes and the transport is upgraded

This typically results in four connections to the server during set-up, in the best case where WebSocket works:

  1. First POST request sending handshake message
  2. Trial WebSocket connection during transport selection
  3. Second POST request to send subscriptions and begin polling
  4. Second WebSocket connection used to actually send messages

Faye 0.8.4 improves this in two ways. First, it begins testing all the transports it has available before it knows what the server supports. This means a WebSocket connection is opened even before the handshake message is sent over HTTP. Second, it caches the connection it uses to test the transport, and reuses it for sending messages. So, in the best case, by the time the server’s handshake response arrives we already have a WebSocket open and can begin using it, so we establish a connection with only two connections.

You may ask, since we’re proactively testing whether WebSocket works, why can’t we rely on that and send the initial handshake over a socket connection, taking the connection count down to one? Well, in the case where WebSockets don’t work, because either the server or an intervening proxy doesn’t support it, it can take a long time to find out that things are broken. In some cases, you don’t get an error event from the WebSocket client until the TCP connection times out, and this can take a long time. This is what the CometD client does, and it renders the client unusable for the first 60 seconds because it tries to send the handshake over an unresponsive socket. So while in the best case you use one less connection, if there’s any problem the client degrades horribly. (It also doesn’t bother testing the connection first, it just assumes that the availability of WebSocket means everything’s fine, which futher compounds its responsiveness problem.)

By using an upgrade strategy and testing transports in the background, it means the client always has a transport it can use to send and receive messages, with no interruption in service. The improved responsiveness when there are problems is easily worth that one extra request in the best case.

One final thing to mention about this release is that I’ve finally written up a guide to securing Faye and other socket-based applications, which includes authenticating both publish and subscribe access, and preventing CSRF and XSS attacks. I’ve decided that educating people about this is better than providing canned extensions for this, since different applications do require different things. If you have experience with socket security and want to contribute, just send me a pull request.