Fargo: a Lisp for asynchronous programming

A couple weeks ago, I tooted:

CoffeeScript ought to take this opportunity to add new features that help with async e.g. tail calls, continuations.

Now there’s a problem with this: it’s a core part of CoffeeScript’s usability and resultant popularity that it compiles to readable JavaScript. If you want to introduce new execution semantics, a straightforward one-to-one syntax transformation is not such a viable option.

But I maintain that, given we’re reinventing the language we work with on the web, now’s as good a time as any to reexamine the tools we use to write programs. We’re all annoyed by working with callbacks, and with good reason. Programming with explicit callbacks introduces race conditions and uncertainty; it’s not the syntactic burden of typing function that causes the real pain, it’s the creation of regions of your program whose order of execution is unpredictable.

In the Ruby world, we’ve been using fibers to great effect to hide the asynchronous nature of code. A fiber is just a special kind of single-use function, whose execution can be suspended and resumed by the user. They are perfect for pausing a block of code while some I/O is done, but they do not block the interpreter: when you pause a fiber the interpreter can carry on doing other work until the fiber is resumed. Ilya’s first example is perfect:

def http_get(url)
  f = Fiber.current
  http = EventMachine::HttpRequest.new(url).get
  # resume fiber once http call is done
  http.callback { f.resume(http) }
  http.errback  { f.resume(http) }
  return Fiber.yield

Calling http_get() initiates an async HTTP request, and just before returning it ‘yields’, i.e. it pauses the current fiber. When the request completes, we resume the fiber with the response. A call to http_get() looks like synchronous code, but behind the scenes the I/O is being done in a way that doesn’t block the interpreter.

I thought it would be fun to try this out in JavaScript, so I hacked up a very basic Scheme-like interpreter to support fibers, and called it Fargo. Scheme usually includes a function called call-with-current-continuation or call/cc, of which fibers are a simplified version. Call/cc allows the current state of a computation (the “continuation”) to be saved and resumed any number of times. This means, at least in a naive implementation, that the saved state of the stack must be copied every time you invoke a continuation, otherwise state would leak between uses. Fibers can only be resumed once from their last yield-point, making running and resuming them cheaper.

Fibers as implemented in Ruby also require the user to explicitly initiate a fiber. Supporting call/cc requires constantly keeping track of the current continuation since it may be captured at any time. Because fibers must be explicitly initiated, when not in a fiber we can use a faster stackless execution engine which doesn’t need to track state so much.

Here’s an example in Fargo of getting a value that’s produced by an async API (set-timeout) without using callbacks in the calling code:

; get-message function: captures the current fiber, yields,
; then resumes after a timeout when we have a message to return
(define (get-message)
  (define f (current-fiber))
  (set-timeout (lambda ()
    (f "Hello, world!")
  ) 5000)

; main program fiber
(define main (fiber ()
  ; get the message using a sync-looking API
  (define msg (get-message))
  ; alert the message
  (alert msg)

; run the main fiber

Because it’s tail-recursive, you can use recursion to implement infinite streams:

> (define strea­m (fibe­r ()
    (let loop ((i 0))
      (yiel­d i)
      (loop­ (+ i 1))))­)

> (stream)
; => 0
> (stream)
; => 1
> (stream)
; => 2

So where is Fargo going? Right now, with my current ban on committing myself to any new large side-projects, it remains a quick (12 days old) experiment. Maybe someone will like the idea enough to turn it into a production language but for now I’m happy to let people try it out and let me know what they think. I’ve been making small improvements to it the last few days to get it to run as much of Heist‘s core library as possible and it now runs all the macros and the list and numeric libraries.

I’m going to keep thinking about how we can make async programming easier and I’d love your ideas – the nice thing about working on a Lisp is that changing the interface it provides is really simple.

Adding a dynamic defmacro to Heist

I’ve just picked up the opening chapters of Let Over Lambda, which describes itself as a book on macro programming – particularly Common Lisp macro programming. One of the early macros given in the book is unit-of-time which looks like this:

(defmacro unit-of-time (value unit)
  `(* ,value
      ,(case unit
         ((s) 1)
         ((m) 60)
         ((h) 3600)
         ((d) 86400))))

In Common Lisp (in so far as I’ve been able to gather in one evening), macros are expanded in an imperative fashion using the built-in CL list manipulation functions. That is, if I call

(unit-of-time 5 m)

CL will bind the integer 5 to parameter value, and the symbol m to unit, then evaluate the list expression inside the macro, in this case giving (* 5 60). That is to say, (unit-of-time 5 m) is replaced with (* 5 60), and the latter is then evaluated at runtime to give the final result. Within a defmacro definition you can use the built-in CL functions – in this case, quasiquotation and the case form – to manipulate source code passed into the macro and transform it into something else. You’re directly manipulating source code by using defmacro.

Scheme’s syntax-rules macro system takes a more declarative approach to macros and provides a pattern-matching language for transforming source expressions into lower-level function calls. A Scheme implementation of the above might look like this:

(define-syntax unit-of-time (syntax-rules ()
  ((unit-of-time value unit)
   (* value (case 'unit
              ((s) 1)
              ((m) 60)
              ((h) 3600)
              ((d) 86400))))))

Superficially this looks almost the same, but there’s an important difference lurking beneath the surface. Whereas the CL macro will expand (unit-of-time 5 m) as (* 5 60), the Scheme version will expand it as:

(* 5 (case 'm
       ((s) 1)
       ((m) 60)
       ((h) 3600)
       ((d) 86400)))

That is, Scheme slots the macro parameters into a template, which is used to replace the original expression. You don’t manipulate the source yourself, you use this templating system to declare transformation patterns. Now a smart compiler might spot that that case expression evaluates to a constant and optimise it, but suppose we’re using a toy interpreter that we’re rather fond of but isn’t all that clever. We want to make Scheme inline the conversion factor. We’d need to write the macro like this:

(define-syntax unit-of-time (syntax-rules (s m h d)
  ((unit-of-time value s) value)
  ((unit-of-time value m) (* value 60))
  ((unit-of-time value h) (* value 3600))
  ((unit-of-time value d) (* value 86400))))

This way, Scheme will pick one of the matching patterns and use the corresponding template with the factor hard-coded. The s case even avoids an unnecessary multiplication.

Still it’s clear that CL is better suited to some types of macro writing: it gives you full access to the source code as a Lisp data structure that you can manipulate with Lisp’s list processing library. Scheme only allows certain rigid transformations using its pattern language, making certain things awkward to express. However, Common Lisp’s power is blunted slightly by most Lisps’ distinction between compile time and run time. Your code is first compiled, one stage of which is expansion of all macros, and once this is finished the expanded source is executed. This means that a defmacro can only use functions that are built into the CL environment. User-defined functions are not available until run time and so cannot be used to manipulate source code.

Heist, being the naïve so-and-so that it is, only has run time. Macro calls are expanded as they are encountered at run time, their expansions being inlined into the source tree once processed. Macros, special forms, built-in and user-defined functions all have first-class status, and are implemented as different types of Function that we call as we evaluate code. This means, if you were to add defmacro to Heist, it would be called at run time and therefore have access to any user-defined function.

So let’s go and implement it. The latest Heist update (0.3.2) added the ability to easily load Ruby files from Scheme land, meaning I can write a Ruby file called defmacro.rb and load it in Scheme using (load "defmacro"). In your Ruby files you can use define and syntax to add functions to the Scheme environment.

To begin with, we’re going to need a new class to model CL-style macros. It should inherit from the Heist Function class so it receives all the references it needs from the interpreter; we just need to override the call method to implement defmacro behaviour.

class CommonLispMacro < Heist::Runtime::Function
  def call(context, cells)
    scope = Heist::Runtime::Scope.new(@scope)
    cells.to_a.each_with_index do |cell, i|
      scope[@formals[i]] = cell
    expression = @body.map { |e| Heist.evaluate(e, scope) }.last

A quick walk-through: Heist will call this type of function with context (an object representing the scope the function call is made from) and cells which is a Lisp list of the unevaluated argument expressions to the function. Heist Function objects have a @scope, which is the scope they are defined in, an array of the names of their arguments in @formals and a Lisp list of the expressions they contain in @body. We begin by making a new Scope in which to evaluate the function’s innards. Then we take the unevaluated argument expressions and assign them in the new scope using the variable names from @formals. We evaluate all the expressions in @body and return the last one, wrapped in an Expansion. The @body should return a list structure that could be evaluated as Scheme source code.

What’s this Expansion object? Well, we don’t want the expanded expression to be the macro call’s return value. We want Heist to insert this new expression into the source code, replacing the old expression, and then evaluate it to get the final value. To make this happen, we need to return an object of type Macro::Expansion, and this object should have an expression method. Heist takes care of all the rewriting and further evaluation as long as your function returns one of these. Our Expansion class is just a dumb wrapper around an expression and doesn’t need to do a lot of the expansion work that Heist’s macro system usually does.

class CommonLispMacro
  class Expansion < Heist::Runtime::Macro::Expansion
    attr_reader :expression
    def initialize(expression)
      @expression = expression

With these defined, all that’s left is to provide a front-end for making these functions from Scheme. In Heist, a Function is constructed using the current scope, a list of argument names and the function body.

syntax('defmacro') do |scope, cells|
  scope[cells.car] = CommonLispMacro.new(scope, cells.cdr.car, cells.cdr.cdr)

In the expression (defmacro foo (one two) (* one two)), cells.car refers to foo, cells.cdr.car to (one two) and cells.cdr.cdr to ((* one two)) – remember the body can contain more than one expression. Now you can drop the above Ruby in defmacro.rb, boot up Heist and use CL-style macros:

(load "defmacro")

(defmacro unit-of-time (value unit)
  `(* ,value
      ,(case unit
             ((s) 1)
             ((m) 60)
             ((h) 3600)
             ((d) 86400))))

(unit-of-time 5 s)
; => 5

(unit-of-time 5 m)
; => 300

Heist 0.3: complete set of Scheme data types

I actually tagged the 0.3.0 release of Heist, my Ruby Scheme implementation, about a month back, mostly to get it off my desk for a while. I’ve made a few minor tweaks and released 0.3.1 over the weekend, so now’s as good a time as any to go over what’s new.

The major milestone for this release was to complete the set of R5RS data types: Heist now supports vectors (including macro and quasiquotation support), characters and strings, including full libraries for each type. These work largely as advertised by the spec, although vector handling is slightly idiosyncratic.

Some Schemes require vectors to be quoted, as in '#(1 2 3). This makes the vector a constant (i.e. it is immutable), and every evaluation of it returns the same object. That is (in mzscheme):

> (define (f) '#(1 2 3))
> (define a (f))
> (define b (f))
> (eq? a b)

> (define (f) (vector 1 2 3))
> (define a (f))
> (define b (f))
> (eq? a b)

So we see that vector allocates a new object every time, but the quoted literal does not. This is still true in Heist, but Heist also allows non-quoted vectors. These return a new mutable object when evaluated.

(define (f) '#(1 2 3))
(define (g) #(1 2 3))

(define a (f)) (define b (f))
; => #(1 2 3)
(eq? a b)
; => #t
(vector-set! a 0 9)
; [error] Cannot modify vector constant

(define c (g)) (define d (g))
; => #(1 2 3)
(eq? c d)
; => #f
(vector-set! c 0 9)
; => #(9 2 3)
; => #(1 2 3)

The reason for this is that unquoted vectors must be allowed — they are part of the syntax-rules spec. But, making them immutable makes performing macro transformations very hard; particularly inlining the transformation into the AST is impossible since it involves setting an attribute or changing a cell on my now-frozen Heist::Runtime::Vector instance. So they need to be mutable (left unfrozen, in Ruby terms). If an unquoted literal evaluated to the same object every time, then the vector-set! procedure would be able to modify the parse tree, which is clearly undesirable. So, unquoted vector literals evaluate to a mutable copy of themselves.

With the Scheme type system completed, where do we go from here? The only pieces of R5RS I’ve not implemented so far are the file I/O procedures, and some of the more exotic continuation stuff like call-with-values and dynamic-wind (I’d love to figure these out but so far they’ve eluded me). My main reason for developing this was to help me with SICP, so I plan to keep going with that and fill in any blanks the book requires along the way.

I may provide faster Ruby implementations of the built-in procedures currently written in Scheme, but that’s not a priority. Paul Graham will tell you that Lisp “came about … as the byproduct of an attempt to axiomatize computation”, and Scheme is no exception. If you write a Scheme yourself, you soon start to find that there do not need to be very many functions built into the core language: a lot of the R5RS library can be written in Scheme itself. Most of the syntactic functions can be written as macros, and many of the data types just need a constructor, accessors and mutators and the user can build the rest on top. Witness the list functions: the core provides cons, car, cdr, set-car!, set-cdr! and pair?, and all the other ‘built-in’ list functions are written in Scheme. SICP chapter 2 reinforces this pattern of data abstraction and is well worth reading as a lesson in robust program design.

So, for the time being at least I’m happy letting Heist be a neat little example of this axiomatization idea at work. I’d check out the source now if I were you, before I get bored and decide the performance needs kicking up a notch.

Talk: Writing a language in 15 minutes

I gave a talk at London Ruby User Group yesterday, based on the work I’ve been doing on Heist, my Scheme interpreter project. I wrote the core of a basic Scheme interpreter in about 15 minutes as a live-coded demo (well, kind of – the coding was pre-recorded so I could focus on talking), which seemed to go down pretty well. If you missed it (or if you were there and want to watch it again in slow motion), here’s the slides and the video (just code, no narrative (sorry)). (Side note: I think Lisp may be affecting my writing style.)

The slides first: lrug-scheme-15.zip. They are S5-format HTML, introducing the Scheme language features I implement during the talk. The video shown below is available at higher resolution from Vimeo.

Scheme interpreter in 15 minutes from James Coglan on Vimeo.

Video is also available from Skills Matter if you want the narrative. The code’s not really visible in this version so combine the audio from this with the above video and you should just about piece things together.

Some relevant links:

  • Heist is my main Scheme interpreter project. It has macros, tail recursion, continuations, and a reasonable chunk of the R5RS spec and its REPL auto-indents your code. It’s about 1000 lines of well-commented Ruby with a few hundred lines of Scheme, including macros for most of the syntax.
  • Stickup is a tiny interpreter for a small subset of Scheme, about 150 lines long. Closer to what I present in the talk, and easier to get your teeth into.
  • Treetop is what I use to generate parsers, it’s super-simple to use and lets you write a parser in no time at all.

Thanks to everyone who came along and had nice things to say about the talks, especially to whoever was telling me about about the trie data structure; Heist’s tab-completion code is now much prettier.

April fool: area man releases world’s slowest Scheme interpreter

With apologies to the ever-entertaining Onion, I’m announcing the 0.2.0 release of Heist, henceforth to be known as “the one with the lists”.

To recap, Heist is a Scheme interpreter I’m writing in Ruby in order to teach myself a few things about how languages work while I read Structure and Interpretation of Computer Programs (which I’m lagging behind with, having been focusing on implementing Scheme rather than learning it). It’s a reasonably advanced toy, supporting tail call optimisation, macros and first-class continuations.

This release totally revises the runtime to properly implement lists (Lisp lists are linked lists made of chained pair objects, rather than arrays), and adds all the R5RS list and pair functions, making Heist considerably more powerful for manipulating data structures. It also adds dotted pair notation, improper lists, rational and complex numbers and associated functions, as well as ‘rest arguments’ for functions using the dot notation in the argument list. A bunch of macro bugs have been squashed relating to keyword handling and nested repetitions, and we now support the R6RS ellipsis escaping ((... ...)) feature to better provide for macro-writing macros.

Finally, this release allows interpreters embedded in Ruby programs to run Scheme code expressed as Ruby data, opening up all sorts of possibilities for manipulating code using Ruby. For example:

scheme = Heist::Runtime.new

scheme.exec [:define, [:square, :x],
              [:*, :x, :x]]

scheme.exec [:square, 9]
#=> 81

At some point I’d like to better expose the Scheme macro system to Ruby as this could make rewriting Ruby code easier, potentially complementing tools like Reg Braithwaite‘s rewrite gem. The internals are still in flux at the moment so this idea will have to wait a bit.

All the new features are a simple gem install heist away. If you want to know how it works I’ve added a lot of comments to the code throughout the runtime, so if you find anything in there that makes no sense let me know and I’ll try to do something about it. Happy Scheming!

Announcing Heist, a new Scheme implementation written in Ruby

There seems to be a tradition of some two years’ standing that dictates I must release some piece of open source software on or around my birthday. A couple of years ago it was Flagger, a Rails plugin for doing useful things with boolean ‘flag’ columns in your database, and last year it was (with the support of theOTHERmedia), the Ojay JavaScript library. The former has faded into obscurity, while the latter continues to be used day-in day-out by myself and various people I work with.

So, at the risk of solidifying this tradition with a third iteration that may very well prove to be hard to top next year, I’m announcing my latest bit of spare-time hackery: Heist. Heist is an interpreter for the Scheme programming language, written in Ruby, that can run Scheme programs in standalone fashion and can be embedded in, and extended using, Ruby applications. Scheme is a dialect of Lisp that was invented in the 1970s and is used in the book Structure and Interpretation of Computer Programs, which I’m currently working my way through. Scheme is also very influential in the design of JavaScript (which is basically Scheme semantics with Java syntax), and Ruby, so it’s also an accessible avenue for me to explore functional programming.

Installation is a snap, assuming the gem has been mirrored to the gem repos by the time you read this:

sudo gem install hoe
sudo gem install heist

Heist runs on Ruby 1.8 and seems fine on 1.9, and partly works on Rubinius.

You may be asking yourself why I did this, and this a perfectly good question. Essentially I just wanted to see if I could, and writing Consent last month got me far along enough with writing an interpreter for a simple language that I thought a real programming language shouldn’t be too hard. Scheme is a perfect language for a beginner as it’s small and has almost no syntax (making it very easy to parse), and it’s perfect for me because its semantics are very close to languages I already know. I managed to get a first pass that could define functions, perform arithmetic and supported closures written on my lunch hour one day last month.

Six weeks later and I’ve got a reasonably ambitious Scheme on my hands. The first goal was to have something capable of tackling chapter 1 of SICP, but as I learned more about the language I set my sights on getting as solid and complete a runtime written as I could. So, while Heist’s function library is fairly small right now, it has runtime support for tail recursion, macros (with optional hygiene), first-class continuations and lazy evaluation. The only other Ruby Schemes I’ve found so far support none of these in any current release.

The astonishing thing is how small this is. The runtime (that’s the engine that executes code after it’s been parsed and on top of which the built-in functions are implemented) is, according to Rails’ stat counter, 672 lines of Ruby, and you can throw most of that away if you don’t want the advanced features. Macros are 214 lines, continuation support about 160, and if you throw out tail calls you’re down to about 260 lines. The parser is a dozen or so Treetop rules, and the built-in functions are a few hundred lines of Ruby and Scheme macros. All very small and relatively digestible for someone wanting to get into language implementations.

I won’t go on much further here since there is much more information on GitHub on Heist’s features and how to use it. You may notice that there are currently no list manipulation functions; this is because I did not do my research and implemented lists using arrays when really you’re supposed to use binary trees. I’m working on fixing this, and list support is a high priority for the next release. However, I’ve found that you can get along without them in some places using macros. For example, here’s an implementation of (case) that does not use any list functions:

(define-syntax case (syntax-rules (else)
  [(case key) #f]
  [(case key (else expr1 expr2 ...))
    (begin expr1 expr2 ...)]
  [(case key (() expr ...) clause ...)
    (case key clause ...)]
  [(case key
         ((datum1 datum2 ...) expr1 expr2 ...)
         clause ...)
    (let ([temp key])
      (if (eqv? temp 'datum1)
          (begin expr1 expr2 ...)
          (case temp
                ((datum2 ...) expr1 expr2 ...)
                clause ...)))]))

Yes this is probably excessively recursive and inefficient, but the point is it’s possible. I’ll mention one final curiosity I’ve discovered while writing this: in lazy evaluation mode, the following is a perfectly fine definition for the Y combinator:

(define (Y f)
  (f (Y f)))

It’s an odd feeling having written something only to discover that you don’t really understand what it is you’ve produced.