Update: turns out this is a lot easier than the method presented below: see Christoffer Sawicki’s implementation. Consider my method an example of what happens when you’re trying to port features from one language to another and end up looking in all the wrong places for a solution.
I know what you’re thinking: we already have String#gsub
, why do we need
something else? Well, while writing PackR, I found out that JavaScript’s
String#replace
is helpful in ways that Ruby can only dream of:
>>> var logger = function() { console.log(arguments); };
>>> 'something'.replace(/(..)(m?)/g, logger)
// Outputs...
["som", "so", "m", 0, "something"]
["et", "et", "", 3, "something"]
["hi", "hi", "", 5, "something"]
["ng", "ng", "", 7, "something"]
How generous is that? You get the matched string portion, each sub-pattern match within the current portion, the starting index of the match, and the full string. Everything you could possibly wish to know in order to process matches. All this information is available in Ruby, it’s just a bit of a pig to get it to work like it does up above.
The first thing we need to do is extend the Regexp
class so that it can count
the number of sub-patterns (bits enclosed by ()
) in a regular expression (bits
of this come straight from Packer):
class Regexp
ESCAPE_CHARS = /\\./
ESCAPE_BRACKETS = /\(\?[:=!]|\[[^\]]+\]/
BRACKETS = /\(/
def count
expression = source.
gsub(ESCAPE_CHARS, "").
gsub(ESCAPE_BRACKETS, "")
expression.scan(BRACKETS).length
end
end
The second extension we need to make is to the String
class. We need to be
able to work out the index of every pattern match within a string, and we can do
this using StringScanner
(part of Ruby core):
require 'strscan'
class String
def indexes(regexp)
scanner, ary = StringScanner.new(self), []
ary << scanner.pointer while scanner.scan_until(regexp)
ary
end
end
It’s important to remember that this will return the indexes of characters
immediately after each match, for example 'something'.indexes /som|i/
returns
[3, 7]
. All we need to do to get the starting index for a match is to
substract the length of the match string from the appropriate end index.
With these in place, we can implement our String#replace
-alike. Ruby already
has a String#replace
method, so for want of something meaningful I’m going to
call this new method js_replace
.
class String
def js_replace(regexp, &block)
string, indexes = dup, indexes(regexp)
n = regexp.count
gsub(regexp) do |match|
args = [match] + (1..n).map { |i| eval("$#{i}") } +
[indexes.shift - match.length, string]
yield *args
end
end
end
You can now pretend you’re writing JavaScript:
>> 'something'.js_replace(/(..)(m?)/) { |*args| puts args.inspect }
["som", "so", "m", 0, "something"]
["et", "et", "", 3, "something"]
["hi", "hi", "", 5, "something"]
["ng", "ng", "", 7, "something"]
>> 'something'.js_replace(/(..)(m?)/) { |a,b,c,d| d }
#=> "0357"