[1+1=2]

OneAndOneIs2

« Occasionally helpful stuffPerl isn' t programming..? »

Tue, Feb 26, 2013

[Icon][Icon]Named regex captures

• Post categories: Omni, Programming, Helpful

An accusation often levelled at Perl is that the code is indistinguishable from line noise. Possibly, modules that allow you to write "Hello, world" as:

''=~('('.'?'.'{'.('`'|'%').('['^'-').('`'|'!').('`'|',').'"'.('['^'+').('['^')').('`'|')').('`'|'.').('['^'/').('{'^'[').'\\'.'"'.('`'^'(').('`'|'%').('`'|',').('`'|',').('`'|'/').','.('{'^'[').('['^',').('`'|'/').('['^')').('`'|',').('`'|'$').'\\'.'\\'.('`'|'.').'\\'.'"'.';'.('!'^'+').'"'.'}'.')')

don't help :)

Regular expressions might have something to do with this. Regular expressions are awesome and part of what makes Perl so good at text processing. You can even use them to solve sudoku puzzles!

But they can be a little bit hard to decipher when you want to do quick at-a-glance "WTF is this code doing??"

For example, you might be looking at something that parses a log file and see this:
$line =~ m#(\d{4}-\d{2}-\d{2}).*(\d\d:\d\d:\d\d)#;

Now, that's actually a fairly trivial regex, but it will still cause you to stop and take a few moments to work out what it does.

The quick & obvious solution here is to chuck in a comment that explains that this is a regex to get the date and time from each line. It'll put them into the variables $1 and $2 respectively. But this isn't flexible: If you want to modify the regex later, because the format has changed/to add another field, you may change the numbers/need to update the comment, etc.

But then I was watching this presentation and it highlighted a way better way of doing it. It's self-documenting, and much more readable. You can name the segments of the regex you want to capture. Not only that, but you can store the named capture regexes in named variables, and then use those variables in another regex. Thusly:

my $date = qr/(?<date>
                \d{4}   # Year
                -   
                \d{2}   # Month
                -   
                \d{2}   # Day 
                )/x;

my $time = qr/(?<time>
                \d\d    # Hour
                :   
                \d\d    # Minute
                :   
                \d\d    # Second
                )/x;

$line =~ m# $date .* $time #x;

And now you can access the date and time fields from the variables $+{date} and $+{time}. And if in the future we need to add a third field, or the order of the fields change, it's trivially easy to update by just moving the variables around. And it's self-documenting, lends itself well to commenting, and really easy to read.

And I'd never even heard of it before. Maybe I need to reread the Camel book's regex section...


2 comments

Ahmad M Zawawi
Comment from: Ahmad M Zawawi [Visitor] Email · http://ahmadzawawi.blogspot.com
True, I also watched that video a while back. And when that slide showed up, I was like why did not anyone tell us about that cool 5.10 feature. $1 sucks compared to the self-documenting $+{date}. Thanks for sharing that information.
26/02/13 @ 16:02
Rajesh Kumar
Comment from: Rajesh Kumar [Visitor]
I also saw the video and was amazed by the named capture. It is really a cool feature.
04/03/13 @ 15:27
 

[Links][icon] My links

[Icon][Icon]About Me

[Icon][Icon]About this blog

[Icon][Icon]My /. profile

[Icon][Icon]My Wishlist

[Icon]MyCommerce

[FSF Associate Member]


April 2017
Mon Tue Wed Thu Fri Sat Sun
 << <   > >>
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Search

User tools

XML Feeds

eXTReMe Tracker

Valid XHTML 1.0 Transitional

Valid CSS!

[Valid RSS feed]

powered by b2evolution free blog software