Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > macOS > Returning only what matched a regular expression

Returning only what matched a regular expression
Thread Tools
WJMoore
Grizzled Veteran
Join Date: Jan 2002
Location: Melbourne, Australia
Status: Offline
Reply With Quote
Apr 20, 2003, 09:10 AM
 
I have a file (actually a HTML file) with a number of links strewn through it. I would like to be able to use a tool to extract just the file name referenced by the links. I have looked at grep, awk and sed but they don't seem to do what I want. grep is fine but it returns the whole line where the match was found I only want the text matched by the regular expression. Anyone know how to do this, I would like to be able to do it on the command line instead of writing a script first (which is how I did it this time using perl).

Wesley
     
cwasko
Senior User
Join Date: Jul 2000
Status: Offline
Reply With Quote
Apr 20, 2003, 10:43 AM
 
To the best of my knowledge, you'd have to do it in Perl.
     
Paul McCann
Mac Enthusiast
Join Date: Nov 2001
Location: Adelaide, South Australia
Status: Offline
Reply With Quote
Apr 21, 2003, 12:17 AM
 
This isn't perfect (matching html with regular expressions is always a little dodgy!), but it might give you what you're after, or at least set you in the right direction:

perl -e 'undef $/;print join"\n",<>=~/HREF=\"(.*?)\"/sg' filename.html

Note that I'm guessing you want the target of the link: if not you'll have to modify the regex accordingly, but from what you've written that's probably not the problem.

[[Very quick explanation: links might cross lines, so the "undef $/" means that the whole file is sucked in as one blob. That blob is matched against the regex in list context, thus returning all the matches as an array. This array is printed out after being joined with spaces.]]

Best of luck,
Paul
     
WJMoore  (op)
Grizzled Veteran
Join Date: Jan 2002
Location: Melbourne, Australia
Status: Offline
Reply With Quote
Apr 21, 2003, 08:08 AM
 
Ok cool, thanks for that. At least I'm reassured I wasn't missing anything and that perl is the way to go. My perl based solution was a script containing this:

foreach $line (<STDIN>) {
if($line =~ /">(Cimg[0-9]+\.jpg)<\/A/) {
print "http://www.host.com/2003-04-18/$1\n";
}
}

Obviously I changed the url, before you get any ideas they were pictures my friend took when we went out the other night.
     
   
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 12:40 AM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,