I've been playing with Go a lot lately and after years of using mostly dynamic languages, I've had to make some adjustments. But not all of the adjustments have to do with static typing. One big difference is regular expression support. The flavor of regular expressions in Go's regexp library is nearly identical to the one described in the RE2 library. This is a non-backtracking engine, so it boasts predictable run times and stack sizes regardless of inputs. The downside is, there are no backreferences or generalized zero-width assertions.
If you find yourself needing to write one of the more pathological regexes that gives backtracking engines fits, it's usually pretty easy to use RE2 from your favorite scripting language too. Perl's re::engine::RE2 is a drop-in replacement for Perl's built-in Regexp engine. It even falls back to the internal one if you write a regex that RE2 can't handle.
$ cpanm re::engine::RE2 ... $ reply 0> use re::engine::RE2 1> my $foo = qr{a?a?aa} $res[0] = bless( qr/(?-ims:a?a?aa)/, 're::engine::RE2' 2> my $bar = qr{(a|b)\1} $res[1] = qr/(?^u:(a|b)\1)/
Note that $foo is an re::engine::RE2 object, but $bar is a built-in Regexp because of the backreference.
Since Python's usual regexes are in a module rather than built in to the language, it ought to be even more natural to substitute RE2...just import re2 instead of re! Unfortunately, this quietly failed in Python 3.
$ pip install re2 ... Successfully installed re2 $ python Python 3.3.1 (default, Apr 7 2013, 08:51:17) [GCC 4.7.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import re2 Traceback (most recent call last): File "", line 1, in File "re2.pyx", line 1, in init re2 (src/re2.cpp:13551) NameError: basestring >>>
It appears to work as advertised in Python 2, but I haven't used Python 2 since matplotlib was ported to Python 3.
The module for Ruby seems to work fine. The only drawback here is that it doesn't feel much like the usual Ruby regexes.
$ gem install re2 ... $ irb --simple-prompt >> require 're2' => true >> RE2::Regexp.new("a?a?aa").match("aaaa") => true
If you're used to all the fancy features we get with the usual regexes in Perl, Python, and Ruby, then you might not like RE2 (or Go). But if your regex is slow on some inputs, then RE2 is easy enough to try!
Recent Comments