Emulating JavaScript’s String#replace in Ruby

Update: turns out this is a lot easier than the method presented below: see Christoffer Sawicki’s implementation. Consider my method an example of what happens when you’re trying to port features from one language to another and end up looking in all the wrong places for a solution.

I know what you’re thinking: we already have String#gsub, why do we need something else? Well, while writing PackR, I found out that JavaScript’s String#replace is helpful in ways that Ruby can only dream of:

>>> var logger = function() { console.log(arguments); };
>>> 'something'.replace(/(..)(m?)/g, logger)

// Outputs...
["som", "so", "m", 0, "something"]
["et", "et", "", 3, "something"]
["hi", "hi", "", 5, "something"]
["ng", "ng", "", 7, "something"]

How generous is that? You get the matched string portion, each sub-pattern match within the current portion, the starting index of the match, and the full string. Everything you could possibly wish to know in order to process matches. All this information is available in Ruby, it’s just a bit of a pig to get it to work like it does up above.

The first thing we need to do is extend the Regexp class so that it can count the number of sub-patterns (bits enclosed by ()) in a regular expression (bits of this come straight from Packer):

class Regexp
  ESCAPE_CHARS = /\\./
  ESCAPE_BRACKETS = /\(\?[:=!]|\[[^\]]+\]/
  BRACKETS = /\(/

  def count
    expression = source.
        gsub(ESCAPE_CHARS, "").
        gsub(ESCAPE_BRACKETS, "")
    expression.scan(BRACKETS).length
  end
end

The second extension we need to make is to the String class. We need to be able to work out the index of every pattern match within a string, and we can do this using StringScanner (part of Ruby core):

require 'strscan'
class String
  def indexes(regexp)
    scanner, ary = StringScanner.new(self), []
    ary << scanner.pointer while scanner.scan_until(regexp)
    ary
  end
end

It’s important to remember that this will return the indexes of characters immediately after each match, for example 'something'.indexes /som|i/ returns [3, 7]. All we need to do to get the starting index for a match is to substract the length of the match string from the appropriate end index.

With these in place, we can implement our String#replace-alike. Ruby already has a String#replace method, so for want of something meaningful I’m going to call this new method js_replace.

class String
  def js_replace(regexp, &block)
    string, indexes = dup, indexes(regexp)
    n = regexp.count
    gsub(regexp) do |match|
      args = [match] + (1..n).map { |i| eval("$#{i}") } +
          [indexes.shift - match.length, string]
      yield *args
    end
  end
end

You can now pretend you’re writing JavaScript:

>> 'something'.js_replace(/(..)(m?)/) { |*args| puts args.inspect }
["som", "so", "m", 0, "something"]
["et", "et", "", 3, "something"]
["hi", "hi", "", 5, "something"]
["ng", "ng", "", 7, "something"]

>> 'something'.js_replace(/(..)(m?)/) { |a,b,c,d| d }
#=> "0357"