Not long had I been using my new plugin when I discovered it made this happen when trying to eager-load on a many-to-many association:
SELECT DISTINCT news_stories.id
FROM news_stories
LEFT OUTER JOIN countries_news_stories
ON countries_news_stories.news_id = news_stories.id
LEFT OUTER JOIN countries
ON countries.id = countries_news_stories.country_id
INNER JOIN countries_news_stories
ON news_stories.id = countries_news_stories.news_id
WHERE (countries_news_stories.country_id = 30)
which is what happens if you ask for
Country.find(30).news_stories.find(:all, :include => :countries)
The SELECT DISTINCT
and INNER JOIN ... WHERE
parts are what fetch the news
stories for country #30. The LEFT OUTER JOIN
parts are the eager loading of
all the countries for each news story. Isn’t SQL fun.
This seems like a perfectly reasonable thing to want to do: I know each news
story is only linked to a handful of countries at most, so eager loading them
isn’t going to grind the database to a halt. The problem is that Rails
constructs an ambiguous statement – I’m joining the countries_news_stories
table, with the same alias, twice. MySQL doesn’t know what to do with that,
and nor should it. That statement will create duplicate column names in the
result set, which is why we’re not allowed to run it without aliasing the table.
That is, giving it a different name each time we use it by saying JOIN
countries_news_stories AS table_1
or some such. This may have something to do
with this Rails bug, though I can’t be sure until I’ve written some tests.
Assuming I’m not going to see a copy of Rails with the bugs fixed for some time, and I want to develop against a stable version, what can I do? As far as I can tell, my options are as follows.
A. Leave my plugin as it is, and if people want to do silly things like eager
load anything but a belongs_to
association, then it’s up to them to deal with
it. Generally, eager loading has_(and_belongs_to_)many
associations is risky
because you don’t know how big a list of records you’ll be trying to fetch and
it could bog down the database. I don’t really like this option, as I’d quite
like to eager-load in this particular instance.
B. Before any find
operation hits the database, rewrite any HABTM :includes
as :joins
statements with numeric table aliases. This has the disadvantage
that, because you’re screwing around with table names, the result set columns
won’t get mapped to the methods on the returned objects and you’ll end up
re-fetching the data from the database anyway. That’s what the Rails docs say,
but I’ve found that doing this does cut down database time in my case. I have a
page of news articles for a particular country, each of which has to display a
list of all the countries it’s linked to (see the query up top). Eager loading
the countries for each story cuts down on database queries but doesn’t get rid
of all of them. The other downside is that it’s a terrible hack involving
hand-writing a mass of SQL and inserting table aliases as required.
C. Let find
s hit the database without doing anything, then catch the
StatementInvalid
exception thrown if an invalid statement is generated and try
to rewrite the query using plan B. I assmue it’s impossible to test for invalid
statements before they hit the database. The whole point is that the statement
in question is ambiguous and cannot be parsed properly by the database. I don’t
imagine that trying to parse it in Ruby is going to be any more successful. This
option would have the upside that if no exceptions are thrown, the result set is
cached properly and your method calls won’t generate extra database hits. The
downside is that handling ambiguous statements now involves two trips to the
database. I have no idea whether the pros outweight the cons with this
technique.
I’m going to write some unit tests for Rails to see if this is a bug in the framework rather than a peculiarity of my application, then I’ll try out plan C on my project tomorrow. I’ll let you know how it went after I’ve made the necessary changes to IncludeByDefault.