Including in circles

Not long had I been using my new plugin when I discovered it made this happen when trying to eager-load on a many-to-many association:

SELECT DISTINCT news_stories.id
FROM news_stories
LEFT OUTER JOIN countries_news_stories
    ON countries_news_stories.news_id = news_stories.id
LEFT OUTER JOIN countries
    ON countries.id = countries_news_stories.country_id
INNER JOIN countries_news_stories
    ON news_stories.id = countries_news_stories.news_id
WHERE (countries_news_stories.country_id = 30)

which is what happens if you ask for

Country.find(30).news_stories.find(:all, :include => :countries)

The SELECT DISTINCT and INNER JOIN ... WHERE parts are what fetch the news stories for country #30. The LEFT OUTER JOIN parts are the eager loading of all the countries for each news story. Isn’t SQL fun.

This seems like a perfectly reasonable thing to want to do: I know each news story is only linked to a handful of countries at most, so eager loading them isn’t going to grind the database to a halt. The problem is that Rails constructs an ambiguous statement – I’m joining the countries_news_stories table, with the same alias, twice. MySQL doesn’t know what to do with that, and nor should it. That statement will create duplicate column names in the result set, which is why we’re not allowed to run it without aliasing the table. That is, giving it a different name each time we use it by saying JOIN countries_news_stories AS table_1 or some such. This may have something to do with this Rails bug, though I can’t be sure until I’ve written some tests.

Assuming I’m not going to see a copy of Rails with the bugs fixed for some time, and I want to develop against a stable version, what can I do? As far as I can tell, my options are as follows.

A. Leave my plugin as it is, and if people want to do silly things like eager load anything but a belongs_to association, then it’s up to them to deal with it. Generally, eager loading has_(and_belongs_to_)many associations is risky because you don’t know how big a list of records you’ll be trying to fetch and it could bog down the database. I don’t really like this option, as I’d quite like to eager-load in this particular instance.

B. Before any find operation hits the database, rewrite any HABTM :includes as :joins statements with numeric table aliases. This has the disadvantage that, because you’re screwing around with table names, the result set columns won’t get mapped to the methods on the returned objects and you’ll end up re-fetching the data from the database anyway. That’s what the Rails docs say, but I’ve found that doing this does cut down database time in my case. I have a page of news articles for a particular country, each of which has to display a list of all the countries it’s linked to (see the query up top). Eager loading the countries for each story cuts down on database queries but doesn’t get rid of all of them. The other downside is that it’s a terrible hack involving hand-writing a mass of SQL and inserting table aliases as required.

C. Let finds hit the database without doing anything, then catch the StatementInvalid exception thrown if an invalid statement is generated and try to rewrite the query using plan B. I assmue it’s impossible to test for invalid statements before they hit the database. The whole point is that the statement in question is ambiguous and cannot be parsed properly by the database. I don’t imagine that trying to parse it in Ruby is going to be any more successful. This option would have the upside that if no exceptions are thrown, the result set is cached properly and your method calls won’t generate extra database hits. The downside is that handling ambiguous statements now involves two trips to the database. I have no idea whether the pros outweight the cons with this technique.

I’m going to write some unit tests for Rails to see if this is a bug in the framework rather than a peculiarity of my application, then I’ll try out plan C on my project tomorrow. I’ll let you know how it went after I’ve made the necessary changes to IncludeByDefault.