Black-box criteria – The If Works

Tim Bray recently published an article called Type-System Criteria, in which he makes the argument that Java, or statically-typed languages in general, is better-suited to mobile development than the dynamically-typed languages that are more prevalent in web development circles. The reason he gives for this boils down to API surface size:

Another observation that I think is partially but not entirely a consequence of API scale is testing difficulty. In my experience it’s pretty easy and straightforward to unit-test Web Apps. There aren’t that many APIs to mock out, and at the end of the day, these things take data in off the wire and emit other data down the wire and are thus tractable to black-box, in whole or in part.

On the other hand, I’ve found that testing mobile apps is a major pain in the ass. I think the big reason is all those APIs. Your average method in a mobile app responds to an event and twiddles APIs in the mobile framework. If you test at all completely you end up with this huge tangle of mocks that pretty soon start getting in the way of seeing what’s actually going on.

The argument goes that, as the API surface you need to integrate with becomes larger, so static type systems become more attractive. I don’t disagree, in part because I don’t have nearly enough experience with static languages to have an informed opinion on them. But at a gut level I believe this to be true, in fact I’d be willing to bet that a majority of the bugs I’ve written while refactoring software could have been caught by a static type checker (and not even a very sophisticated one, at that).

But the excerpt I quoted above contains a code smell, and it points to another reason why mobile development is difficult. It’s not the size of the APIs that’s the big problem: it’s the nature of the application.

Web application servers are comparatively easy to test because the tests can be written by talking to an encapsulated black box. You throw a request (or several) at a web server, you read what comes back, and check it looks like what you expected. On the other hand, testing web application clients is much more complex: instead of doing simple call/response testing, you have to initiate events within the application’s environment, and then monitor changes to that environment that you expect the events to cause. The core difference here is that client-side programs tend to be what I’m going to refer to as ‘stateful user interfaces’, and mobile (and desktop) software falls into the same category.

What exactly do I mean by ‘stateful user interface’? When you call a web server, you don’t need to hold onto any state on your end: you ask the server a question by sending it a request, and it sends back a fully-formed, self-contained response. When you’ve checked that response, you throw it away and start the next test. In contrast, stateful user interfaces are long-running processes in which incremental changes are made to what the user sees. Instead of getting a fresh new page, just a part of the view is changed, or a sound is emitted, or a notification generated, or a vibration initiated. The programming paradigm in a server environment emphasises call/response, statelessness and immutability; in a client environment you have side effects, state and incremental change. Testing in such environments is hard.

I think this, rather than large API surface, is the real problem. Large API surfaces are only a problem if your application code talks to them directly, and this is much more common in side-effect-heavy applications. Unit tests in these environments tend to be messy for several reasons:

Application code responds to events triggered by the host environment
Business logic produces its output by modifying the host environment rather than returning values
It is hard or impossible to reset the environment to a clean state between tests

The third reason is a particular problem when unit testing client-side JavaScript, and I’ve seen plenty of tests where the state of the page or the implementation of event listeners is such that it becomes very difficult to keep each test independent of the others. You also have the problem that anything that causes a page refresh will cause your test runner to vanish. (I wrote about this exact problem in Refactoring towards testable JavaScript.)

So if side-effect-heavy programs cause large API surfaces to be a problem, what should we do about it? The answer comes down to something I think of as ‘avoiding framework-isms’. This means that any time you have a framework or host environment in which user input or third-party code drives your application, the sooner you can dispatch to something you control the better. The classic example of this is the ‘fat model, skinny controller’ mantra popular in the Rails community: rather than dump lots of code in a controller that’s only invoked by the host server and framework, turn the request into calls to models. This way, the bulk of the logic is in objects that you control the interface to, and that are easy to create and manipulate, properties that also make them easy to test.

In client-side JavaScript and other stateful user interfaces, this means keeping event listeners small. Ideally an event listener should extract all the necessary data from the event and the current application state, and use this to make a black-box call to a module containing the real business logic. It means making sure orthogonal components of a user interface do not talk to each other directly, but publish data changes via a message bus. And it means writing business logic that returns results rather than causes side-effects; the side-effects again being dealt with by thin bindings to the host environment.

I’ll finish up with a small but illustrative example. Say you’re writing a WebSocket implementation, and the protocol mandates that when you call socket.send('Hello, world!') then the bytes 81 8d ed a3 88 c3 a5 c6 e4 af 82 8f a8 b4 82 d1 e4 a7 cc should be written to the TCP socket. You could write a test for it by mocking out the whole network stack (which I’ve probably glossed over considerably here):

describe WebSocket do
  before do
    @tcp_socket = mock('TCP socket')
    TCP.should_receive(:connect).with('example.com', 80).and_return @tcp_socket
    @web_socket = WebSocket.new('ws://example.com/')
  end
  
  it "writes a message to the socket" do
    @tcp_socket.should_receive(:write).with [0x81, 0x8d, 0xed, 0xa3, 0x88, 0xc3, 0xa5, 0xc6, 0xe4, 0xaf, 0x82, 0x8f, 0xa8, 0xb4, 0x82, 0xd1, 0xe4, 0xa7, 0xcc]
    @web_socket.send("Hello, world!")
  end
  
  # More mock-based protocol tests...
end

Or you could test it by implementing a pure function that turns text into WebSocket frames, leaving the code that actually deals with networking doing only that and nothing else:

describe WebSocket::Parser do
  before do
    @parser = WebSocket::Parser.new
  end
  
  it "turns text into message frames" do
    @parser.frame("Hello, world!").should == [0x81, 0x8d, 0xed, 0xa3, 0x88, 0xc3, 0xa5, 0xc6, 0xe4, 0xaf, 0x82, 0x8f, 0xa8, 0xb4, 0x82, 0xd1, 0xe4, 0xa7, 0xcc]
  end
  
  # More protocol implementation tests...
end

describe WebSocket do
  before do
    @tcp_socket = mock('TCP socket')
    TCP.should_receive(:connect).with('example.com', 80).and_return @tcp_socket
    
    @parser = mock('parser')
    WebSocket::Parser.should_receive(:new).and_return @parser
    
    @web_socket = WebSocket.new('ws://example.com/')
  end
  
  it "converts text to frames and sends them" do
    frame = mock('frame')
    @parser.should_receive(:frame).with("Hello, world!").and_return frame
    @tcp_socket.should_receive(:write).with(frame)
    @web_socket.send("Hello, world!")
  end
  
  # And we're done here
end

This separates the business logic (implementing the WebSocket protocol) away from the side effects to the host environment (writing to network connections). This results in code that’s more modular, much easier to test, and less coupled to the API surface of the host environment. If a static type system helps you with that then have at it, but recognize when it’s a symptom of a deeper problem.