 |
 |
perl expression matching
|
 |
|
 |
|
Fresh-Faced Recruit
Join Date: Aug 2003
Status:
Offline
|
|
How do I formulate a perl command that will take in a string, search for a pattern, and return the contents of a captured group?
For instance:
STRING: "assortedtagsandtext<tag1>some text<tag2>thetextiwant<tag2>moreassortedtagsandtex t"
PATTERN: .*<tag1>.*?<tag2>(.*?)<tag2>.*
(I think that's right, though I haven't written patterns in a while.)
How do I write a perl command that takes in STRING and returns the captured group, which should = "thetextiwant"? (Ideally, this would be a single command that I could type in a terminal...)
(Last edited by ewagner; Aug 16, 2003 at 05:38 PM.
)
|
|
|
| |
|
|
|
 |
|
 |
|
Dedicated MacNNer
Join Date: Jul 2001
Location: NC
Status:
Offline
|
|
Your regular expression worked although the question marks are redundant. I would use a character class to keep that match out of the tags. "Greedy" matching can get you in trouble from time to time if you don't get in the habit of controlling it. Here's a script that will print the text:
#!/usr/bin/perl
my $str = "assortedtagsandtext<tag1>some text<tag2>thetextiwant<tag2>moreassortedtagsandtex t";
$str =~ s/[^<>]*<tag1>[^<>]*<tag2>([^<>]*)<tag2>[^<>]*/$1/;
print "$str\n";
|
|
Gary
A computer scientist is someone who, when told to "Go to Hell", sees the
"go to", rather than the destination, as harmful.
|
| |
|
|
|
 |
|
 |
|
Mac Enthusiast
Join Date: Nov 2001
Location: Adelaide, South Australia
Status:
Offline
|
|
So if you want to use this as a one-liner --supposing you've got a file "filename" full of lines that might match-- then something like
Code:
perl -ne 'print "$1\n" if /<tag1>[^<>]*<tag2>([^<>]*)<tag2>/' filename
should do the trick. No need for a substitution that I can see, and the leading and trailing matches are pretty much redundant. (ie, they'll always match). Note that we're assuming here that there'll be no "<" or ">" characters --escaped or otherwise-- amongst the text.
If you want to spice this up a bit you might think of allowing the match to work across multiple lines (as long as the newline occurs outside of a tag). In which case it'd look more like
perl -0ne 'print "$1\n" while /<tag1>[^<>]*<tag2>([^<>]*)<tag2>/g'
This reads the whole file in at once then searches for all matches within that megastring.
Cheers,
Paul (avoiding work again)
|
|
|
| |
|
|
|
 |
|
 |
|
Dedicated MacNNer
Join Date: Jul 2001
Location: NC
Status:
Offline
|
|
Originally posted by Paul McCann:
perl -ne 'print "$1\n" if /<tag1>[^<>]*<tag2>([^<>]*)<tag2>/' filename
Paul (avoiding work again)
Hi Paul,
Me too. That's a very cool command! I don't think I'll ever get used to the fact that Perl executes the "if" clause before the command to which it applies even if it's written after. So how long does the $1 hold the capture? Until the next command is executed or until the next match?
|
|
Gary
A computer scientist is someone who, when told to "Go to Hell", sees the
"go to", rather than the destination, as harmful.
|
| |
|
|
|
 |
|
 |
|
Mac Enthusiast
Join Date: Nov 2001
Location: Adelaide, South Australia
Status:
Offline
|
|
Hi Gary,
I have a sneaking suspicion that this is one of those things that might have changed over time: at least, I remember there being *talk* of the default behaviour being changed.
My understanding is that the variable $1, $2 etc are reset only upon another successful match. The "change" being mooted at some stage was for them to reset on *any* call to m// or s/// (etc), but hazy recollections suggest that people cried "foul" because it would break some funky code.
Yep, quick example shows what happens...
Code:
#!/usr/bin/perl -w
use strict;
my $string="hello";
print "$1\n" if $string=~/(hell)(o)/;
print "\$1 currently reads $1\n\$2 currently reads $2\n";
$string=~/goodbye/;
print "\$1 currently reads $1\n\$2 currently reads $2\n";
$string=~/(h.*)/;
print "\$1 currently reads $1\n";
print "\$2 currently reads $2\n" if defined($2);
Output is
hell
$1 currently reads hell
$2 currently reads o
$1 currently reads hell
$2 currently reads o
$1 currently reads hello
So a successful match with a *single* capture wipes out all the containers. Makes good sense.
Cheers,
Paul
|
|
|
| |
|
|
|
 |
 |
|
 |
|
|
|
|
|

|
|
 |
Forum Rules
|
 |
 |
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
 |
 |
 |
 |
|
 |
|