Writing your own expression language in Ruby

The last few days, I’ve been writing Consent, a tool for writing declarative firewalls for Rails apps. I thought it would be interesting to dig into its implementation now that the code’s settled down, as it’s one of the more complicated DSLs I’ve written, and certainly the first one that makes decent use of Ruby’s features. Turns out method_missing and operator overloading can get you a ton of expressive power for writing your own mini-languages.

The grammar for Consent’s expression langauge (not that it’s strictly specified, but this will do) looks something like this:

list          ::=   expression [+ expression]*
expression    ::=   controller[action]?[params]?[format]?
controller    ::=   name[/name]*
action        ::=   .name
params        ::=   (:name => value[, :name => value]*)
format        ::=   [.|*]name
name          ::=   [a-z][a-z_]*
value         ::=   integer|string|regexp|range

Consent expressions are capable of matching controller and action names, parameters, response formats and HTTP verbs (thought I’ve left the latter out of the grammar for now), and if the above looks a little abstract there are examples on Github. Just to get us started, here’s a typical expression that uses most of the syntax available in the language (I’m leaving rules blocks out for this post):

  ajax/maps + tags.create + users(:id => /admin/)*json

This matches requests for:

  • Any action in Ajax::MapsController
  • TagsController#create
  • Any UsersController action where params[:id] matches /admin/ and the response format is JSON

When we talk about doing DSLs in Ruby, we typically mean that we’re going to provide an API for writing Ruby code that’s idiomatic for the problem we’re trying to solve. It does not mean writing a parser and interpreting trees, though we will in effect be writing an interpreter of sorts. The problem Consent solves is that of declaring access control rules for actions in a Rails app, and its expression language was dictated by that and by the constraints of what constitutes valid Ruby code.

So, we’ve established that expressions need to be valid Ruby. Ruby will parse our DSL’s code but it’s up to us to provide all the methods to output useful data structures from the DSL. We want Consent expressions to generate objects storing request data: controller and action names, etc. Now we run into our first problem: the environment the above expression runs in will need to provide the method names ajax, maps, tags, etc. These names could be pretty much anything, so it’s clear we’re going to need method_missing. But, we don’t want to go defining method_missing up at the the top level as that could well wreak havoc; we want to be able to inject our expression language into carefully chosen classes so they only work in certain places. In Consent this is handled something like this:

module Consent
  
  class Expression
    module Generator
      # method_missing etc
    end
    # Expression methods - we'll get to these later
  end
  
  class Description
    include Expression::Generator
    attr_reader :rules
  end
  
  def self.rules(&block)
    desc = Description.new
    desc.instance_eval(&block)
    @rules = desc.rules
  end
  
end

When you call Consent.rules, the block is instance_evaled inside a Description object. Description includes the Expression::Generator module, which contains the method_missing that we need to get our DSL started.

(Bear in mind, this code will not end up looking exactly like Consent. It’s supposed to be a demonstration of how to build up a DSL from scratch, which reminds me: always write your tests first when designing a DSL. It makes it so much easier to figure out what you want and how you might go about building it.)

What does this method_missing need to handle? Well, the top-level method calls in the above expression are (broadly speaking) controller names, and they can take an optional parameter list. So we might guess that method_missing should generate a new Expression using the controller name and params.

module Consent::Expression::Generator
  def method_missing(name, params = {})
    Expression.new(name, params)
  end
end

class Consent::Expression
  def initialize(controller, params = {})
    @controller, @params = controller.to_s, params.dup
  end
end

All the top-level method calls in the language will return an Expression with at least a @controller value. We can knock the first Expression method on the head quickly by looking at the expression tags.create, which we want to match the action TagsController#create. So, we’re going to need another method_missing in Expression itself to handle action names. This should just set the action name and return the Expression so it can be further combined with others. Action calls can also take param lists, so we factor that in:

class Consent::Expression
  def method_missing(name, params = {})
    @action = name.to_s
    @params.update(params)
    self
  end
end

Next, notice the expression ajax/maps, which we want to match Ajax::MapsController. Ruby will interpret the expression by calling ajax and maps, then combining the two resulting Expressions using the division operator. We need to define this operator in Expression, but what should it do? We know the first operand (the receiver of the / call) will simply have a @controller value, whereas the second operand may be more complex, e.g. the expression ajax/maps.find(:user => "James"). Let’s implement it by having the first operand modify the second’s @controller value, then returning the second operand:

class Consent::Expression
  def /(expression)
    expression.nesting = @controller
    expression
  end
  
  def nesting=(name)
    @controller = "#{ name }/#{ @controller }"
  end
end

We modify the nesting of the second expression and return it. To do this, we implement a nesting= method, that adds a namespace onto the front of the expression’s @controller.

The last method we need to deal with before thinking of how to combine expressions is how to deal with response formats. There are two ways to do this: profiles.list.json matches a JSON request for ProfilesController#list, which profiles*json matches a JSON request for any action in ProfilesController. To deal with the first case, we need to change Expression#method_missing so that if the Expression already has an @action, we set the format instead:

class Consent::Expression
  def method_missing(name, params = {})
    @format = name.to_s if @action
    @action ||= name.to_s
    @params.update(params)
    self
  end
end

Notice the ||= assignment so we only set @action if it’s not already set. Also, note that this allows format calls to take param lists, but I’m not particularly bothered about that. The implementation will really allow a fair number of slightly nonsensical things, but right now I’m just concerned with getting the language to work.

The other format method is the multiplication operator. Again, Ruby will interpret profiles*json by creating a ProfilesController expression and a JsonController expression before combining them. This time, it’s the first operand that contains information we want to keep; we just want to modify it using the second’s @controller value. The second operand is effectively thrown away. Also, the * is supposed to signify “any action” so let’s nullify the @action.

class Consent::Expression

  def *(expression)
    @format = expression.instance_eval { @controller }
    @action = nil
    self
  end
end

Finally, we need to combine expressions into lists, which we do using the addition operator. To do this, we’re going to need a new class to represent groups of expressions. This will have a + method that simply takes an expression or a group and adds it to a list:

class Consent::Expression::Group
  include Enumerable
  
  def initialize
    @exprs = []
  end
  
  def each(&block)
    @exprs.each(&block)
  end
  
  def +(expression)
    expression.is_a?(Enumerable) ?
        expression.each { |exp| @exprs < exp } :
        @exprs < expression
    self
  end
end

This allows a Group to be combined with other Groups by adding the Expressions from one to another. Now we can implement Expression#+ by simply creating a new group and adding the two operand expressions to it:

class Consent::Expression
  def +(expression)
    Group.new + self + expression
  end
end

And that’s everything you need for a basic implementation of the grammar I showed you up top, in barely 60 lines of Ruby. I’ve left HTTP verbs out of the mix but hopefully you can see how you might implement them (hint: you need some extra methods in Generator that take Expressions and Groups). I’ve also neglected rule blocks for the time being as they introduce a lot of complication and I wanted to focus on the expression interpreter in this post. Maybe another time, eh?

Oh, and as an exercise for the reader: see if you can write an inspect method for Expression and Group that returns the original source code that generated the expression, it makes for really helpful log messages.