Every so often, you need a collection of items where you know the items in it are unique. JavaScript having no standard library, there’s a pretty strong temptation to hack something using an array, since that’s the closest-looking data structure available to you. A set’s just a list with no duplicates, right?
Well, kinda. In practise it won’t work out that well, since the time it takes to determine whether an item is already in the set scales with the size of the collection, and this becomes very expensive for large amounts of data. You could keep the array sorted, which will speed up searches quite a bit: you get O(log n) insertion rather than O(n). But we can do better.
Many set implementations (see Ruby, Java) use a hashtable as the storage for a
set. This is because hashtables are optimised for finding keys as quickly as
possible (ideally they should maintain constant search performance as the
collection grows), and they make sure no key appears twice. In JavaScript we
don’t have proper hashtables, but we do have Object
, and if our data consists
of strings or numbers, this is good enough.
In a set implementation, we just use the fact that an object does not allow duplicate keys, and put some dummy values in for each key. We don’t care what the values are, we only care about the keys.
Set = function() {
this._table = {};
};
Set.prototype.add = function(value) {
if (this.contains(value)) return false;
this._table[value] = true;
return true;
};
Set.prototype.remove = function(value) {
delete this._table[value];
};
Set.prototype.contains = function(value) {
return this._table.hasOwnProperty(value);
};
Set.prototype.forEach = function(block, context) {
for (var value in this._table)
block.call(context, value);
};
Set.prototype.toArray = function() {
var array = [];
this.forEach(array.push, array);
return array;
};
This is plenty to let us add, remove and iterate over values in the set. Note
how the add()
method returns true
or false
to tell us if the item was
added or not.
> load('set.js')
> s = new Set()
> s.add('foo')
true
> s.add('foo')
false
> s.add('bar')
true
> s.toArray().join(', ')
foo, bar
> s.remove('foo')
> s.toArray().join(', ')
bar
Of course, if I need a collection of objects rather than just strings, I’ll reach for one of the JS.Set classes, but for a lot of what you’d typically do with JavaScript the above will get you a long way. And if iteration performance is problematic, there are ways to optimise.