Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Developer Center > Regular Expressions...

Regular Expressions...
Thread Tools
Clinically Insane
Join Date: Nov 1999
Status: Offline
Reply With Quote
Jan 8, 2002, 05:49 PM
 
I'm having some trouble with some of my regular expressions. Most of the ones I work with, I can get to do what I want, but I'm having problems with three of them that don't even seem to come close...

Take a chunk of text, likely to contain multiple paragraphs (like a messageboard entry). I want to do the following with it:

1) Get every chunk of text which is both preceded and followed by at least two newlines.
2) Get every chunk of text which is preceded by an arbitrary number of newlines, but followed by only one.
3) Get every URL in a document.

This would, I assume, take three expressions. The only problem is, I can't get anything I can think of to work. Anyone here have any ideas?
You are in Soviet Russia. It is dark. Grue is likely to be eaten by YOU!
     
Senior User
Join Date: Feb 2001
Location: Rochester, uk
Status: Offline
Reply With Quote
Jan 9, 2002, 03:57 AM
 
Most of my regex experience was in Perl, which i haven't used in a year or two, but my guesses would be something like:

<BLOCKQUOTE><font size="1"face="Geneva, Verdana, Arial">code:</font><HR><pre><font size=1 face=courier>
$var ~ m/\n\n([^\n]*)\n\n/;
$result = $<font color = blue>0</font>; <font color = brown>// or possibly $<font color = blue>1</font>, can't remember</font>
</font>[/code]

That will match any string that does not contain a newline, but which is preceded and followed by two newlines. I can't remember how you match all of them, rather than just one - figure that bit out yourself.

<BLOCKQUOTE><font size="1"face="Geneva, Verdana, Arial">code:</font><HR><pre><font size=1 face=courier>
$var ~ m/\n*([^\n]*)\n[^\n]/;
$result = $<font color = blue>0</font>; <font color = brown>// again, may be $<font color = blue>1</font></font>
</font>[/code]

This matches anything preceded by any number of newlines, and followed by a newline and a non-newline character. This fails to match when the final newline is the last character in the string, but it's early morning and i can't think straight. None of this code has been tested, so it's probably all a big pile of schite.

The last one is more complicated - you have to match several known strings, including http:, mailto:, ftp:, www., *@*.*, etc.

Regular expressions can do an incredible amount of stuff, including everything you asked for, if you only know how to use them. What language are you using, anyway?
All words are lies. Including these ones.
     
Grizzled Veteran
Join Date: Feb 2001
Location: Germany
Status: Offline
Reply With Quote
Jan 9, 2002, 04:26 AM
 
to add a little to the code above, "\n{2}" looks a bit nicer than "\n\n", but that's only cosmetic.
the result will be in $1, $0 is the name of your perl program.

might be an approach to grabbing urls would be something like ([\w]?/\/\S?), this would take everything that has "://" in it as a url. might be something like (((https?|ftp)/\/|mailto\S?) would be a better approach, but that's just a guess...

[edit: disabled smilies, didn't turn out the way i expected it to... ;-)]

[ 01-09-2002: Message edited by: seb2 ]
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 09:41 PM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2