If nothing else, this week has taught me a few things about bug fixing. As I’ve written about before, my IncludeByDefault plugin (or, more accurately, the project I’m using it for) exposed a bug or two in Rails. Revision 17 is the result of a very messy process trying to chase a bug up and down the bowels of ActiveRecord. To make sure I never do this again, I’ll let you know how I arrived at such a tangled messy situation.
IncludeByDefault works its mind-numbingly simple magic by intercepting
ActiveRecord::Base.find_every
using alias_method_chain
. My feeling is that,
as far as possible, your methods should not care about the innards of the
methods they’re replacing, that way the original method’s author can alter it
and you still get that change in your code. That’s why I like to intercept
rather than overload.
So then I’m coding away with my new plugin, when suddenly I decide I want to do this:
Country.find(id).news_stories.find(:all, :include => :countries)
which is what we call a many-to-many scoped find with cascaded eager loading.
Rails (at time of writing) will choke on this and raise
ActiveRecord::StatementInvalid
, because the SQL it generates contains
duplicate table aliases. So, I figured that, while I’m intercepting
find_every
, I’ll catch that exception and try to do something about it. This
is the crux of why solving the bug was so tricky: I started solving it in
completely the wrong place.
find_every
is quite a bit higher up the call stack than the method that first
throws the exception
(ActiveRecord::Associations::ClassMethods.select_limited_ids_list
). Even
still, your options for dealing it with are fairly similar whichever one you
intercept – the arguments you have at your disposal are basically the
:options
hash from find
. You could try and do something with the
JoinDependency
class at this stage as that’s where the errant JOIN statements
come from, but that led me round in circles.
I decided that what I’d do was convert any troublesome :include
options to
:joins
and write out tons of SQL by hand to rename the tables used. That’s
basically fine, except for two things: for one, Rails won’t let you use :joins
in this context, so I had to hack that in by completely overwriting
ActiveRecord::Base.add_joins!
. Second, removing anything from the :include
array means that JoinDependency
won’t write out column aliases for the tables
you want to include, so now you need an extra method to write out all your
column aliases. If you don’t do this, the data simply won’t be pulled from the
database. Cue a wrapper for column_aliases
and a new instance variable to pass
around information about your new :joins
.
The next problem is that the method using_limitable_reflections?
will forget
that you were not using limitable reflections, because you’ve removed all HABTM
associations from the :include
array. Cue another wrapper for that to get it
to remember what you were doing before you generated loads of bad SQL. If you
don’t do this and you have duplicate many-to-many links in the DB, you’ll get
back result sets that are too small because uniqueness is enforced in Ruby
rather than SQL. (I’ve posted this on Rails Trac.)
So you’ve written out your JOIN fragments and all your column aliases and
managed to get the uniqueness condition to load properly, so let’s send your SQL
off to the database. One small problem: JoinDependency
has no idea about your
JOIN statements and won’t load the data from them into objects, thus completely
negating the point of eager loading anything. Several times during this fixing
process, it dropped the associations on the floor entirely so I coudn’t even
retrieve them, let along eager load anything.
The solution to this is, unfortunately, another method overwrite, this time of
ActiveRecord::Associations::ClassMethods.find_with_associations
. Thing is, by
this stage, you’ve chased the bug round so many different methods by
intercepting that you’re convinced there’s no other path to fixing it, so
another overwrite will have to do. This one passes information about your join
tables to JoinDependency#instantiate
so it can do something useful with them
and load them back in as if they’d come from an :include
, thus completing the
eager loading process.
So I got it all working, and thought: why don’t I try and fix the framework
itself rather than trying to intercept all over the place? And that’s exactly
what I did. Once you’ve decided you’re going to modify a system rather than
intercept its workings, you don’t have any issues with overwriting anything. If
you make a change and the unit tests pass, all is good. The irony is that fixing
this in the framework itself only required me to modify one existing method:
add_joins!
, which my so-called workaround did anyway. The fix works by
extracting the table names from every join you try to add, and re-aliasing them
if it spots any duplicates. You can find the patch in this patch, and in
an up-to-date copy of IncludeByDefault.
Let this be a lesson to you: if you need to fix a bug, make sure you start from the most sensible place: the code that directly introduces the bug into your system. Trying to work around it will only give you headaches. This of course assumes you can modify your system – if Rails weren’t open source, this would literally have been impossible to fix.