for vs. do {} while

While developing Sylvester, I’ve read up a little on how to make JavaScript perform as best it can. One of the recommendations given in this Andy King article and elsewhere is that you should use do {} while instead of for for looping purposes. I’m not singling out King for criticism, but I want to clarify some of what he says.

He does point out that the use of this construct:

do {
  // Code goes here
} while (--i);

depends on the starting value of i being greater than zero. He doesn’t point out that by putting the while first, you can accommodate zero starting values and get exactly the same sequence of i values.

while (i--) {
  // Code goes here
}

Now the loop will not execute if i starts at zero and you won’t get an infinite loop. Note the switching of the position of the decrement operator to ensure the right values are passed into the loop. King does not clarify this and suggests that the following will do the same thing:

do {
  // Code goes here
} while (--i);    // pre-decrement
do {
  // Code goes here
} while (i--);    // post-decrement

Say i begins at 3 in both cases. The first will iterate the loop with the values 3, 2 and 1: pre-decrementing 1 evaluates to 0 and the loop stops. But, post-decrementing 1 evaluates to 1 so the loop will run one more time in the second case.

Finally, onto the issue of speed. (These findings are based on testing in Firefox 2.0 using Firebug.) Yes, while is faster than for if you can use the decrement operator as a conditional statement. The efficiency comes from the fact that you’re using the same command to change the iteration value and check whether we should loop again, so it’s only useful for situations where you can count down to zero rather than needing to count upwards. Also, if you need to use the iteration counter in the loop and you need to count forwards, you need to do something like this:

var n = 10, k = n, i;
do {
  i = k - n;
} while (--n);

which, it turns out, is slower than using a for loop to achieve the same thing:

for (var i = 0; i < n; i++) { ... }

So to sum up: if you can afford to count backwards to zero, use a while loop as this will involve the fewest commands to actually make the loop run. If you need to count forwards or end on anything other than zero, use a for loop and cache as many things as you can. For example, write

var n = someArray.length;
for (var i = 0; i < n; i++) { ... }

instead of putting the array length statement directly inside the for statement. Retrieving a property from an object takes longer than retrieving a primitive value, so this is the most efficient way to do things.

Enhancing your linked list

Having given you the basics of linked lists in my last couple of posts, I’ll now give you some useful extensions to the class that you can use to make it even friendlier.

What would be really useful is a little function that lets you pretend the list is an array and grab a node by its index:

LinkedList.prototype.at = function(i) {
  if (!(i >= 0 && i < this.length)) { return null; }
  var node = this.first;
  while (i--) { node = node.next; }
  return node;
};

Note this is part of the generic list class, which the circular class inherits from. This method will get less and lass efficient the longer your list is – it is only possible to access linked list nodes sequentially and this makes access slower than for arrays.

Second, we might like a way of finding a node containing some piece of data, if we’re using LinkedList.Node:

LinkedList.prototype.withData = function(data) {
  var node = this.first, n = this.length;
  while (n--) {
    if (node.data == data) { return node; }
    node = node.next;
  }
  return null;
};

An iterator might be nice as well – this takes advantage of the fact that JavaScript lets you pass functions as arguments to other functions:

LinkedList.prototype.each = function(fn) {
  var node = this.first, n = this.length;
  for (var i = 0; i < n; i++) {
    fn(node, i);
    node = node.next;
  }
};

This lets you iterate over the nodes and gives you the index of each one, so you can write:

myList.each(function(node, i) {
  // Do something with the node
});

Note that you cannot remove nodes while inside an each loop as this will break the link needed to get to the next node. Even doing something like nextNode = node.next before the call to fn(node, i) doesn’t help.

And finally, it helps to have a couple of methods to convert to and from arrays:

LinkedList.prototype.toArray = function() {
  var arr = [], node = this.first, n = this.length;
  while (n--) {
    arr.push(node.data || node);
    node = node.next;
  }
  return arr;
};

Note that this is part of the generic class as its form will not differ for different types of linked list. Also, note that it will give you back the node data if you’re using LinkedList.Node. The members of the array you get back will be the same objects as are contained in the list, not copies of them. To convert an array to a circular list:

LinkedList.Circular.fromArray = function(list, useNodes) {
  var linked = new LinkedList.Circular();
  var n = list.length;
  while (n--) { linked.prepend(useNodes ?
      new LinkedList.Node(list[n]) : list[n]); }
  return linked;
};

And that just about wraps it up. You can download the complete source for the linked list class I’m using (with various modifications) in some of my projects. It’s MIT-licensed, so do what you like with it.

Down with templating languages

There’s been a couple of posts I’ve read in the last few days discussing the merits of allowing varying amounts of code into the HTML templates in web applications. One point that some people seem to be missing with regard to code in your templates is this: I don’t care what it is, I care about what it does.

What I mean by this is that it should not be against the rules to include code in your templates providing it’s doing the right job, given its context. To take a Rails example:

<% @posts.each do |post| %>
  <div class="post">
    <h3><%= link_to(post.title, post_url(post)) %></h3>
    <%= textilize(post.body_text) %>
  </div>
<% end %>

Yes, there’s Ruby in there. There are data objects and method calls and loops. If we were feeling particularly naughty we might throw some conditional logic in there too. The point is, the above is not evil because it’s presentation logic. If I had a line above this saying <% @posts = Post.find(:all, :conditions => ... , :order => ... , :limit => ... ) %> then that would be bad, in the same way that using link_to in a controller or a model doesn’t make much sense.

You need a programming language of some sort in your templates, otherwise how are you going to display the data your application has cooked up for you? (I’ve tried using ITX with PHP sites and that is not the way to do things. Making your controller explicitly do variable substitution and looping on your template is plain messy and repetitive, and couples the view and controller together too tightly with loads of variable names.) And, given that writing a templating language on top of the base language is only going to slow things down, you might as well let the base language of the app be your template language.

Anyone that can find me a serious, professional web designer who would be scared off by the above example or break it horribly can have a shiny gold medal. The only programming code a template should contain is variable substitution, formatting, looping and maybe some conditional statements, and I’d be happy letting any web designer worth paying write such code. People need to let go of the idea that you can’t embed a full-on scripting language in templates in case people abuse it: it’s insulting to template authors, and if you’re the sort of person that could abuse it, you ought to know better anyway.

A little more on linked list nodes

It occurred to me, after writing about linked lists yesterday, that you might want to do this:

var myList = new LinkedList.Circular();
myList.append(foo);
myList.append(bar);
myList.first = baz;

Doing that would break the list: baz does not (we assume) have correct prev and next pointers set. And so we have another use for the LinkedList.Node: you can change its data property and leave its pointers intact:

var myList = new LinkedList.Circular();
myList.append(new LinkedList.Node(foo));
myList.append(new LinkedList.Node(bar));
myList.first.data = baz;

You’ve just changed the first item in the list, but its placeholder in the list remains unchanged.

I also mentioned that you might want to use nodes if you need to include items in multiple lists: several nodes can have their data property point to the same object. But, you can also, should you wish, include the same item multiple times in the same list:

var myList = new LinkedList.Circular();
myList.append(new LinkedList.Node(foo));
myList.append(new LinkedList.Node(bar));
myList.append(new LinkedList.Node(foo));

That list has three distinct nodes, but the first and last nodes both point to foo. So you could use this, for example, to make a queue of operations where some actions need to be repeated at different points in the sequence.

Writing a linked list in JavaScript

Before I get into the details of this, what exactly is a linked list? Well, it’s a type of data storage structure that has some similarities to the array data type. The key difference is that, rather than storing values using numeric indexes, it stores them by creating links between successive elements so that they form a chain. You can only traverse such a list by following the links between elements, rather than by specifying an index. The advantage of linked lists is that they make removing and re-ordering elements really easy and efficient.

For example, for Sylvester‘s next release, I’m implementing an algorithm that splits a polygon into a collection of its constituent triangles. It does this by recursively removing vertices from the polygon, chopping off one triangle each time. Managing the list of vertices using an array in this instance would quickly cause headaches, as you have to shuffle each element down a notch after removing an element from the list using a bunch of slice() and push() operations. As well as being inelegant, this is also really inefficient. Linked lists allow you to cleanly remove items from them, and all this operation has to do is change the links on the elements either side of the removed one.

JavaScript does not have a built-in linked list data type, so we need to implement our own. I’m going to show how to write a doubly-circularly-linked list, which sounds like the most complicated but is also the most versatile. It contains the functionality of linear and singly linked lists, and adds more features on top. If you want a more restricted type of list, this code should give you a good starting point.

I’ll start by specifying a base class that holds properties (and methods, later) shared between all types of linked list.

function LinkedList() {}
LinkedList.prototype = {
  length: 0,
  first: null,
  last: null
};

I’ve given it a length property that will work the same way as that for arrays. The properties first and last will be pointers to the first and last nodes in the list – we need somewhere to start traversing the list from! Throughout this, it’s important to remember that objects in JavaScript are pass-by-reference: setting myList.first = someObject means that myList.first is a reference to the object represented by someObject, not a copy of its value. In fact, the two variables are both references to the same data in memory – change one and you’ve automatically changed the other. This pass-by-reference feature is what makes linked lists possible.

Next, I’m going to subclass this to write the specifics of my doubly circular list.

LinkedList.Circular = function() {};
LinkedList.Circular.prototype = new LinkedList();

We need a way of adding nodes to the list. I should point out that a linked list is only good for storing lists of objects, not primitive values like strings or numbers. We need to set properties on the list nodes themselves, and we can only do that with objects.

Our first method, append, takes an object and sticks on the end of the list. If the list is empty, the node’s prev and next pointers will point to itself – the list is circular. Otherwise, we just link it up to the previous node and the first node in the list by setting properties:

LinkedList.Circular.prototype.append = function(node) {
  if (this.first === null) {
    node.prev = node;
    node.next = node;
    this.first = node;
    this.last = node;
  } else {
    node.prev = this.last;
    node.next = this.first;
    this.first.prev = node;
    this.last.next = node;
    this.last = node;
  }
  this.length++;
};

Note how we set the last property last of all – if we did it sooner then it would point to the wrong node for the links we want to create.

The second useful method we need is insertAfter, which allows us to insert a node anywhere in the list. The logic for this follows the same pattern – everything is done by just setting some properties, rather than shuffling array elements around. This highlights another important distinction between arrays and linked lists – a linked list does not ‘contain’ its nodes in the same way that arrays contain their values. Linked lists emerge out of links between objects, and a LinkedList object is just a plain object with three properties – first, last and length.

For insertAfter, we don’t need to deal with the special case of the list being empty, as we need some pre-existing node to insert the new one after. We do have to deal with the case where the new node becomes the last node in the list:

LinkedList.Circular.prototype.insertAfter = function(node, newNode) {
  newNode.prev = node;
  newNode.next = node.next;
  node.next.prev = newNode;
  node.next = newNode;
  if (newNode.prev == this.last) { this.last = newNode; }
  this.length++;
};

Pay attention to the order in which properties are set – anything that needs to be retrieved in order to set a link cannot be modified until nothing else depends on it. We set the links on the incoming node first by retrieving properties from the list, then skip forward to link the node in front, then finally change the node we’re inserting after, after we’ve got all the information we need from it.

The remove method is a little more involved, as we need to handle what happens when there’s only one node remaining. Again, pay attention to the order of operations and how it avoids broken links.

LinkedList.Circular.prototype.remove = function(node) {
  if (this.length > 1) {
    node.prev.next = node.next;
    node.next.prev = node.prev;
    if (node == this.first) { this.first = node.next; }
    if (node == this.last) { this.last = node.prev; }
  } else {
    this.first = null;
    this.last = null;
  }
  node.prev = null;
  node.next = null;
  this.length--;
};

That’s all you need to get a basic linked list going. I add some methods for iteration, searching etc. but the core is what’s written above. I’ve leave prepend and insertBefore as exercises for the reader they’re really not tricky.

One other useful class you might need is an explicit node class. Sometimes, you’ll need to allow objects to belong to several linked lists, but each object can only have one prev and one next pointer. The cleanest way to solve this is to create a class specifically for use as a linked list node, that can point to objects you want to list.

LinkedList.Node = function(data) {
  this.prev = null; this.next = null;
  this.data = data;
};

So then you’d write

myList.append(new LinkedList.Node(someObject));

which appends a new object with a pointer to the object to actually want to list as its data property.

It’s a little tricky to see how this is so different from an array at first, but one you start using it where it’s appropriate you’ll find it so much more useful. Later this week I’ll post some additional useful methods and a source file for you to download.

How to fix bugs in software the hard way, or, why open source software is so damn helpful

If nothing else, this week has taught me a few things about bug fixing. As I’ve written about before, my IncludeByDefault plugin (or, more accurately, the project I’m using it for) exposed a bug or two in Rails. Revision 17 is the result of a very messy process trying to chase a bug up and down the bowels of ActiveRecord. To make sure I never do this again, I’ll let you know how I arrived at such a tangled messy situation.

IncludeByDefault works its mind-numbingly simple magic by intercepting ActiveRecord::Base.find_every using alias_method_chain. My feeling is that, as far as possible, your methods should not care about the innards of the methods they’re replacing, that way the original method’s author can alter it and you still get that change in your code. That’s why I like to intercept rather than overload.

So then I’m coding away with my new plugin, when suddenly I decide I want to do this:

Country.find(id).news_stories.find(:all, :include => :countries)

which is what we call a many-to-many scoped find with cascaded eager loading. Rails (at time of writing) will choke on this and raise ActiveRecord::StatementInvalid, because the SQL it generates contains duplicate table aliases. So, I figured that, while I’m intercepting find_every, I’ll catch that exception and try to do something about it. This is the crux of why solving the bug was so tricky: I started solving it in completely the wrong place.

find_every is quite a bit higher up the call stack than the method that first throws the exception (ActiveRecord::Associations::ClassMethods.select_limited_ids_list). Even still, your options for dealing it with are fairly similar whichever one you intercept – the arguments you have at your disposal are basically the :options hash from find. You could try and do something with the JoinDependency class at this stage as that’s where the errant JOIN statements come from, but that led me round in circles.

I decided that what I’d do was convert any troublesome :include options to :joins and write out tons of SQL by hand to rename the tables used. That’s basically fine, except for two things: for one, Rails won’t let you use :joins in this context, so I had to hack that in by completely overwriting ActiveRecord::Base.add_joins!. Second, removing anything from the :include array means that JoinDependency won’t write out column aliases for the tables you want to include, so now you need an extra method to write out all your column aliases. If you don’t do this, the data simply won’t be pulled from the database. Cue a wrapper for column_aliases and a new instance variable to pass around information about your new :joins.

The next problem is that the method using_limitable_reflections? will forget that you were not using limitable reflections, because you’ve removed all HABTM associations from the :include array. Cue another wrapper for that to get it to remember what you were doing before you generated loads of bad SQL. If you don’t do this and you have duplicate many-to-many links in the DB, you’ll get back result sets that are too small because uniqueness is enforced in Ruby rather than SQL. (I’ve posted this on Rails Trac.)

So you’ve written out your JOIN fragments and all your column aliases and managed to get the uniqueness condition to load properly, so let’s send your SQL off to the database. One small problem: JoinDependency has no idea about your JOIN statements and won’t load the data from them into objects, thus completely negating the point of eager loading anything. Several times during this fixing process, it dropped the associations on the floor entirely so I coudn’t even retrieve them, let along eager load anything.

The solution to this is, unfortunately, another method overwrite, this time of ActiveRecord::Associations::ClassMethods.find_with_associations. Thing is, by this stage, you’ve chased the bug round so many different methods by intercepting that you’re convinced there’s no other path to fixing it, so another overwrite will have to do. This one passes information about your join tables to JoinDependency#instantiate so it can do something useful with them and load them back in as if they’d come from an :include, thus completing the eager loading process.

So I got it all working, and thought: why don’t I try and fix the framework itself rather than trying to intercept all over the place? And that’s exactly what I did. Once you’ve decided you’re going to modify a system rather than intercept its workings, you don’t have any issues with overwriting anything. If you make a change and the unit tests pass, all is good. The irony is that fixing this in the framework itself only required me to modify one existing method: add_joins!, which my so-called workaround did anyway. The fix works by extracting the table names from every join you try to add, and re-aliasing them if it spots any duplicates. You can find the patch in this patch, and in an up-to-date copy of IncludeByDefault.

Let this be a lesson to you: if you need to fix a bug, make sure you start from the most sensible place: the code that directly introduces the bug into your system. Trying to work around it will only give you headaches. This of course assumes you can modify your system – if Rails weren’t open source, this would literally have been impossible to fix.

In response to Tim Bryce’s Theory P

Now I know better than to rise to deliberately inflamatory comments, but Tim Bryce seems to be deadly serious when he espouses his ‘Theory P’. I’m going to try to respond to this article in as level-headed a fashion as I can. Let’s go.

(Yes I know it’s from 2005, but I only just read it and I assume Bryce’s attitude won’t have changed much.)

The concept of Theory P does not attempt to introduce any new theories of management. Instead, it identifies those elements from Theories “X”, “Y” and “Z” pertaining directly to the management of Programmers, hence the Theory “P” designation. Theory P, therefore, represents a style of management for a particular job segment.

The fact that he’s touting this as a ‘theory’, rather than just some opinions of his, makes it hard to take seriously. He’s already being condescending by dressing up his well-intentioned management advice as something loftier than it is.

In many cases, management is faced with a paradox: how to manage the programming department without irritating the programmers and cause them to abandon the company, leaving corporate systems prone to malfunction and in need of maintenance. Programmers are hip to this and often use this as leverage for job security.

This is not a paradox, it’s a dilemma. And in any case, why does he view managing people (yes, programmers are people, too) as inherently confrontational? Chances are, if your priority as a manager is avoiding antagonising your workforce, you’re in the wrong job. Bryce seems downright scared of us.

The more effectively we manage the people who program the computer, the better we can utilize the systems to support the information needs of the business.

Who talks like this? If he’s willing to concede that programmers are human, then I’ll do likewise. Does he talk like this to his kids? He later accuses programmers of using protracted language to show off. I’ll say no more.

First, I deliberately avoided the term “Software Engineer” because this would imply the use of a scientific method to programming. Regardless of how one feels about the profession, this is hardly the case. Basically, the programmer’s task is to convert human understandable specifications into machine understandable instructions. From this perspective, a programmer can best be characterized as a translator. Unfortunately, such a delineation chafes people in this profession.

I agree to some extent that programming is not scientific, although I feel he doesn’t want to call us Engineers because it makes us sound important. He prefers Programmers, to rhyme with Worker Bees. The reality is that programming requires the ability to understand at least basic maths and have a highly logical mind. You don’t need to be genius, but these are traits associated with science. The fact that something is not straightforward does not make it unscientific. Programming works by you coming up with an idea for a solution to a problem, testing it to see if it works, then trying something else if it doesn’t. Sounds an awful lot like the scientific method, even if the details are a tad haphazard.

He is right to note that “such a delineation chafes people in this profession”. I should say it chafes translators, too. Try telling an accomplished literary translator that they are ‘just’ a translator and see how it goes down. His suggestion is that programming is just typing successive instructions into a machine. It might have been like that when he started out, but not any more.

Programmers tend to perceive themselves as free-spirited intellectuals who possess the magic of technology. … To outsiders, programmers are viewed as a sort of inner-circle of magicians who speak a rather cryptic language aimed at impressing others, as well as themselves. Such verbosity may actually mask some serious character flaws in their personality.

Every field of human activity has some sort of jargon associated with it. You could argue that language is itself a form of jargon designed for describing concepts that most people are familiar with. I can’t help how programmers are viewed to “outsiders” (he often asserts that we deliberately shun non-techies) but I can tell you that often, programmers need to discuss concepts that have no real-world analogues. You wouldn’t ask a particle physicist to refer to the W+ boson in terms of something ‘everyday’. We have strange names for things because they don’t exist anywhere except inside computers and we need a common language with which to discuss them. And I don’t feel I need to bother answering his suggestion that we tend to have “serious character flaws”.

It is not unusual for programmers to have problems socializing with others outside of their profession. Their language and technical interests tend to make them somewhat cliquish

Again, this could apply to any realm of human interest. People with common interests flock together because it makes them feel all warm and fuzzy. Some techies aren’t the most socially adept people, but I feel he’s appealing to the cartoon stereotype of spotty nerds hacking away in their bedrooms a bit too much.

There are few, if any, true programming geniuses in the average corporate shop.

That’s why we call them average. I don’t find it surprising that the leaders of the field aren’t working in the same office as me. I do find it surprising that Tim Bryce thinks that not being a genius makes one worthy of contempt.

Regardless of the image they wish to project, the average programmer does not have a higher IQ than any other worker with a college degree. In fact, they may even be lower. Most exhibit little imagination and require considerable instruction and coaching in performing their job. When they have mastered a particular programming task, the source code becomes a part of their portfolio which they carry from one job to the next. So much so, that copying or stealing source code is actually the predominant mode of development in most companies. Consequently, there is little original source code being produced in today’s software.

Hold up, what?

… copying or stealing source code is actually the predominant mode of development in most companies.

Sorry, thought I read that wrong. For someone who claims to have 30 years of industry experience not to understand the concept of library code, frameworks etc is unbelievable. Also, if no original source code is being written, what am I spending day after day doing? You think Rails magically understands the requirements of my company’s new online venture and writes it all for me? However low-level your source code is, there is still some sort of framework between you and the machine. I don’t see anything wrong with using ActiveRecord to handle my database for me, in the same way that I don’t think I’m cheating by not pushing individual electrons around or writing my whole web app using assembly language. Pushing electrons around is a solved problem, which my motherboard handles very nicely, thank you.

It is now too convenient for an employee to walk away with source code a company paid dearly for.

As far as I’m aware, I’m not paid by the number of original lines of code I produce, I’m paid to deliver working software. My future employers will no doubt be pleased if they present me with a problem that I already coded a solution to on a previous job. I doubt my boss would think it a good use of company time for me to implement my own ORM from scratch when a perfectly serviceable and extensible one already exists. Bryce wants to make a good business case for managing programmers, so how come he wants us to spend weeks and months reinventing the wheel on each new project?

Edit I didn’t know this at the time, but code you write while at work is actually the property of your employer. I maintain that, despite this, there is a dividing line to be drawn somewhere over what for want of a better term I’m going to refer to as ‘fair use’. I think anything one might reasonably refer to as a ‘snippet’ of code for some very generic task is fair game for throwing in your pastebin. We programmers learn partly by sharing such snippets on mailing lists. Obviously I would never share anything substantive or business-specific with the public unless my employer were open-sourcing the product, and I think that managers who would assume otherwise of their staff are doing themselves no favours.

If programmers do not demonstrate personal initiative to learn new subjects, the company should not waste time and money trying to teach them

I actually agree with him here: unwillingness to learn is the sole reason why the web is full of bloated <table>-based pages and reams of inline, everything-is-global JavaScript. He states before this that programmers “usually possess a curiosity about technological developments”, but then contradicts himself by saying we need poking now and again to force us to learn new things. In my mind, if you’re in this industry and haven’t learned anything new in the last, say, month, then there’s something amiss. I don’t care how tiny it is. You learned about a method in your chosen framework that you didn’t know about before. If you can’t do that you shouldn’t be programming.

It is well known that programmers generally abhor organization and discipline. Their desks are often littered with stacks of paper and other debris. … In fact, many programmers deliberately appear disorganized to make it difficult to judge how they are progressing on their work effort and reveal inadequacies in workmanship.

We are, to a man, disorganised lazy lying scum. We spend all day on Facebook or MySpace and never do any work. And you were concerned about making us leave your company. You don’t need us, man, we’ll just mess up the place and hog network bandwidth by downloading music from BitTorrent. That’s right, we’re thieves too. That’s why we steal so much of your source code. Seriously though, if you want to know how my work’s progressing, ask me. Chances are, I’ll tell you, because I don’t like the idea that you think I’m not doing any work, and why would I withhold information from you about the software you’re paying me to write? Bryce’s philosophy seems to be based on the idea that everyone will lie to him all the time, which in all likelihood is a self-fulfilling prophecy.

Mental laziness can also be found in planning and documenting software. Instead of carefully thinking through the logic of a program using graphics and text, most programmers prefer to dive into source code without much thinking

To be sure, some planning is good. It helps to have some grasp of what the finished product should do before you begin work. But, writing software is what Horst Rittel and Melvin Webber would call a wicked problem: it can be “clearly defined only by solving it, or by solving part of it”. You will probably rewrite your first attempt at coding something, but that’s okay. You won’t fully understand what you’re writing until after you’ve begun writing it.

Improve communications within the programming staff by developing a standard glossary of terms. This will also be useful to outsiders who have to interface with programmers.

A standard glossary of terms? If you need standard terms for development issues that aren’t already in everyone’s vocabulary, chances are there’s some computer science jargon (heaven forfend) for whatver it is you’re trying to discuss. How’s about we stick to that, and then we only have one group of people who need to learn some new words. I know this sounds lazy, but if you ask developers to learn some arbitrary set of new words just to talk to you I guarantee you won’t get a straight answer out of them. Also, I know I’m only young but never in my life have I thought of myself as “interfacing” with another human being.

Develop security measures to safeguard the company’s intellectual property.

Again, DO NOT stop me reusing library code. It keeps me productive and therefore keeps you happy. Of course I’m not going to hand over specific implementation code for your e-commerce credit card handling system. Give me a bit of respect. Also, I know you paid for it, but code I write is my intellectual property – I created it, the copyright is mine. If it were yours, well then you wouldn’t need me to write it for you. I’ve since discovered this to be factually incorrect (see ‘Edit’ above).

He rounds up with a series of vague well-meaning statements that could easily apply to anyone in any industry. Most people think their jobs are hard and underpaid. Programmers are no more whiny than the rest of society. He really strikes gold in the comments though:

In my +30 years of experience in this field, you are the first to accuse me of inproprieties and there appears to be no reasoning with you over this. Your arguments give proof that:

1. I have forgotten more about development than you are ever going to know.

2. My thesis regarding Theory P is correct.

All the Best,
Tim Bryce

There probably aren’t many really good ways of asserting one’s authority on a subject, but he’s scraping the barrel with this. How’s about, “With respect, I’ve been in this industry for 30 years and this has been my general experience. If you don’t fit this mold then that’s great, I’m glad to hear you’re not one of the people I’m talking about.” Also, for the record: disagreement does not prove your ‘thesis’.

I tried. I mean, I really tried to be level headed about this. My concern is that if managers generally have this attitude, not just to programmers, but to staff in general, they shouldn’t be surprised that they need to be defensive all the time. We, the programmers, do not hate you. We like getting paid to do stuff we’d happily occupy our free time doing, as long as you’ve an interesting project for us to work on. Stop running your business like all your staff are out to rob you, and please please please: do something with your website. If the developers that cooked that thing up are representative of the programmers he’s met over the years, I’m not surprised he doesn’t trust us.

IncludeByDefault progress

IncludeByDefault, as mentioned in my last post, hit some snags with ActiveRecord generating duplicate table aliases when doing cascaded includes, e.g.

Tag.find(8).posts.find(:all, :include => :tags)

So, I set out to work around it, only to run into further problems. I went with option C: let find operations get all the way to the database, and then catch StatementInvalid if it is thrown and convert problematic :includes to :joins fragments by hand.

But, turns out Rails won’t let you use the :joins option on finds scoped on HABTM associations (like the one above), meaning I had to hack support for that in order to get my workaround for the duplicate naming bug to work. Even then, eager loading refuses to work properly – asking for the tags of any of the returned posts will go and fetch them from the database, even though the find operation included JOIN fragments to load the tags.

The other complication is that, if you have duplicate many-to-many links in your database, Rails returns incorrect result sets if you use :limit and :include, as long as the :include contains no has_many or has_and_belongs_to_many associations. Which means that converting troublesome :includes (those that raise StatementInvalid in the example up top, and are to-many associations) to :joins triggers this bug. I had to hack in a way for ActiveRecord to remember to select unique records properly, but this still only works as long as you :include suitable associations to begin with.

I have been round and round in circles intercepting various bits of the ActiveRecord call stack trying to get rid of this, but it’s whack-a-mole: fixing the eager loading thing sets off the unique records problem, or makes the call stack retry the database calls too many times, or drops the association on the floor so you can’t even read it never mind eager load it. I’ve got the plugin to a state where:

  • If your :include doesn’t cause any SQL problems, it runs just fine.
  • If it raises StatementInvalid, we’ll rewrite it. The association causing the problem probably won’t get eager loaded, but your query will at least run.
  • You will avoid the duplicate links problem as long as your :include includes a to-many association.

All these apply whether the :include is added using include_by_default or explicitly in your find call – the plugin will catch the exceptions either way. If anyone knows a way to get all this to work without actually including large parts of rewritten Rails (so far there’s one actual method overwrite – the rest are wrappers that catch exceptions) then I’d love to hear about it.

Including in circles

Not long had I been using my new plugin when I discovered it made this happen when trying to eager-load on a many-to-many association:

SELECT DISTINCT news_stories.id
FROM news_stories
LEFT OUTER JOIN countries_news_stories
    ON countries_news_stories.news_id = news_stories.id
LEFT OUTER JOIN countries
    ON countries.id = countries_news_stories.country_id
INNER JOIN countries_news_stories
    ON news_stories.id = countries_news_stories.news_id
WHERE (countries_news_stories.country_id = 30)

which is what happens if you ask for

Country.find(30).news_stories.find(:all, :include => :countries)

The SELECT DISTINCT and INNER JOIN ... WHERE parts are what fetch the news stories for country #30. The LEFT OUTER JOIN parts are the eager loading of all the countries for each news story. Isn’t SQL fun.

This seems like a perfectly reasonable thing to want to do: I know each news story is only linked to a handful of countries at most, so eager loading them isn’t going to grind the database to a halt. The problem is that Rails constructs an ambiguous statement – I’m joining the countries_news_stories table, with the same alias, twice. MySQL doesn’t know what to do with that, and nor should it. That statement will create duplicate column names in the result set, which is why we’re not allowed to run it without aliasing the table. That is, giving it a different name each time we use it by saying JOIN countries_news_stories AS table_1 or some such. This may have something to do with this Rails bug, though I can’t be sure until I’ve written some tests.

Assuming I’m not going to see a copy of Rails with the bugs fixed for some time, and I want to develop against a stable version, what can I do? As far as I can tell, my options are as follows.

A. Leave my plugin as it is, and if people want to do silly things like eager load anything but a belongs_to association, then it’s up to them to deal with it. Generally, eager loading has_(and_belongs_to_)many associations is risky because you don’t know how big a list of records you’ll be trying to fetch and it could bog down the database. I don’t really like this option, as I’d quite like to eager-load in this particular instance.

B. Before any find operation hits the database, rewrite any HABTM :includes as :joins statements with numeric table aliases. This has the disadvantage that, because you’re screwing around with table names, the result set columns won’t get mapped to the methods on the returned objects and you’ll end up re-fetching the data from the database anyway. That’s what the Rails docs say, but I’ve found that doing this does cut down database time in my case. I have a page of news articles for a particular country, each of which has to display a list of all the countries it’s linked to (see the query up top). Eager loading the countries for each story cuts down on database queries but doesn’t get rid of all of them. The other downside is that it’s a terrible hack involving hand-writing a mass of SQL and inserting table aliases as required.

C. Let finds hit the database without doing anything, then catch the StatementInvalid exception thrown if an invalid statement is generated and try to rewrite the query using plan B. I assmue it’s impossible to test for invalid statements before they hit the database. The whole point is that the statement in question is ambiguous and cannot be parsed properly by the database. I don’t imagine that trying to parse it in Ruby is going to be any more successful. This option would have the upside that if no exceptions are thrown, the result set is cached properly and your method calls won’t generate extra database hits. The downside is that handling ambiguous statements now involves two trips to the database. I have no idea whether the pros outweight the cons with this technique.

I’m going to write some unit tests for Rails to see if this is a bug in the framework rather than a peculiarity of my application, then I’ll try out plan C on my project tomorrow. I’ll let you know how it went after I’ve made the necessary changes to IncludeByDefault.

New Rails plugin: IncludeByDefault

It’s true, I’m a plugin writing machine. Seriously though, this one’s tiny. I took all of five minutes to write. What it does is, it lets you specify a default value for the :include option on ActiveRecord::Base.find, so you can automate eager loading of associations. I’ll use an example I’m comfortable with:

  class BlogEntry < < ActiveRecord::Base
    has_many :photos
    has_many :comments
    include_by_default :photos, :comments
  end

With that in place, the photos and comments will automatically be loaded in the same database query as the entry you ask for. You can still use the :include option with find – it will override the defaults you specify. Install as usual:

script/plugin install
    git://github.com/jcoglan/include_by_default