In modern Perl, we have named captures, so if we have something like this
my $string = shift // 'foo bar baz'; my $pat = qr/(?:(?<word>\w+)\W*)/; say $+{word} if $string =~ /$pat/;then we get
fooBut what if we want to match more than one word? Don't you think this ought to work?
$string =~ /$pat+/; say for @{$-{word}};If our pattern matched one or more times, then shouldn't all the matches be in our array of matches? Well, they're not. We just get the last one
bazIf we know we want to match three times, then this works
$string =~ /$pat$pat$pat/; say for @{$-{word}};so why not one or more? A couple of years ago, I read in Jan Goyvaerts and Steven Levithan's Regular Expressions Cookbook that this is possible in .Net regular expressions. I don't normally use .Net, so I took their word for it. Well, today I got around to trying it for myself.
A while back, I bought a netbook that came with Windows 7 Starter Edition, which includes PowerShell 2.0 (I understand PowerShell gives us full access to .Net). I repartitioned the hard drive and installed Ubuntu, which is what I normally use. But Windows is still on there as well, so today I booted into it and fired up the PowerShell. I did not have an easy time (despite having purchased Bruce Payette's book), but eventually I figured it out. PowerShell has all kinds of neat shortcuts, but you can't use those for this. You have to build a .Net System.Text.RegularExpressions.Regex
object directly. Actually, PowerShell allows us to just call it regex
, so I guess we still get a shortcut of sorts. If we write this
$string = 'foo bar baz' $pat = [regex] "(?:(?<word>\w+)\W*)+" $m = $pat.match($string) $m.groups["word"].captures | %{$_.value}then lo and behold, we get all three of our matches
PS> .\multicapture.ps1 foo bar baz
Shortly after reading about this in Goyvaerts and Levithan, I had the good fortune of taking a class with Damian Conway at YAPC::NA. He mentioned that we would be able to do this in Perl 6. I piped up that we can do it now in .Net and further suggested that perhaps we could use that information to shame the Perl 5 developers into enabling it in Perl 5. But it looks like maybe we're going to have to wait for Perl 6. In the mean time, we can iterate through the matches
say $+{word} while $string =~ /$pat/g;This gives the result I want
$ ./multicapture.pl foo bar bazand is arguably easier to read and write than the PowerShell version.
(2 Dec 2011) ETA: Note that this doesn't apply only to named captures (also pointed out by @Stevenharyanto in the comments below). Here are some Ruby folks talking about the same issue with numbered captures.
Traditionally captures (named or not) in Perl never gets converted to array automatically. Perhaps we can, something like:
"a b c d e" =~ /((?\w+)\W+(?{push @word, $+{w}}))+/
@word will contain ("a", "b", "c", "d").
being done internally by the RE engine.
Posted by: Stevenharyanto | 10/27/2011 at 08:31 PM
@Stevenharyanto
If think you meant this: "a b c d e" =~ /(?:(?\w+)\W+(?{push @word, $+{w}}))+/
Nice!
Posted by: Kevincolyer.wordpress.com | 10/28/2011 at 06:16 AM
I think the blog is filtering out greater than and less than tags - my example now does not work!
Once again: "a b c d e" =~ /(?:(?<w>\w+)\W+(?{push @word, $+{w}}))+/
Posted by: Kevincolyer.wordpress.com | 10/28/2011 at 06:21 AM
But wait...when we say
we get an array of matches. But if we say
we do not. What is the difference?
Posted by: oylenshpeegul | 10/30/2011 at 06:36 PM
@oylenshpeegul:
It's the difference between
/(.)(.)(.)/
and
/(.){3}/
The former creates three captures, whilst the latter just one, which must match three times and captures the last thing it matched.
You could get the /(.)(.)(.)/ behavior through other means though, ala
$pat x= 3;
/$pat/;
@Stevenharyanto & @Kevincolyer:
Beware that (?{ ... }) and lexicals are still very buggy (but will probably be fixed by the next release of Perl, huzzah!), which means that your examples won't quite work under strict unless you go out of your way to make them work:
This is because (${ ... }) acts as a closure of sorts.
With some luck on Perl 5.16 this won't be a worry, but until then you have a few options:
state:
Closure:
local and our:
Posted by: Account Deleted | 11/02/2011 at 08:25 AM