Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Developer Center > YAREQ (Yet Another Regular Expression Question)

YAREQ (Yet Another Regular Expression Question)
Thread Tools
Professional Poster
Join Date: Dec 2000
Location: Chicago, Illinois
Status: Offline
Reply With Quote
Jun 27, 2002, 12:51 AM
 
So here's my question. I need to take a text file and pare out all the words. Using regular expressions, that's pretty easy:

</font><blockquote><font size="1" face="Geneva, Verdana, Arial, sans-serif">code:</font><hr /><pre style="font-size:x-small; font-family: monospace;">&quot;[a-zA-Z-]+&quot;</pre><hr /></blockquote><font size="1" face="Geneva, Verdana, Arial, sans-serif">Now what I want to do is find every pair of words. So for the phrase: "I am a pretty little girl", I need to be able to get 'I am', 'am a', 'a pretty', 'pretty little', little girl'. I tried do this:

</font><blockquote><font size="1" face="Geneva, Verdana, Arial, sans-serif">code:</font><hr /><pre style="font-size:x-small; font-family: monospace;">&quot;[a-zA-Z-]+[^a-zA-Z-]+[a-zA-Z-]+&quot;</pre><hr /></blockquote><font size="1" face="Geneva, Verdana, Arial, sans-serif">This sort of works, except that I end up getting 'I am', 'a pretty', 'little girl'. Does anyone know how I can setup the regular expression correctly to get the result I want? It has something to do with rewind marks, but I'm not much of a regular expression buff.

Thanks!
F-bacher
     
Mac Elite
Join Date: Sep 2000
Location: Tempe, AZ
Status: Offline
Reply With Quote
Jun 27, 2002, 02:42 AM
 
I didn't think you could rewind regular expressions. BBEdit's got nice documentation for 'em in their help if you've got it. Or you could 'man regexp' in the terminal.
Geekspiff - generating spiffdiddlee software since before you began paying attention.
     
Grizzled Veteran
Join Date: Sep 2000
Location: Springfield, MA
Status: Offline
Reply With Quote
Jun 27, 2002, 03:44 AM
 
At the risk of sounding terribly ignorant, is this really necessairy?

I mean, once you have a list of the words, it's trivial to generate the word pairs. Maybe for some reason this won't work in your situation, but I'd wager that the code would be more maintenable than some messy regexp if you were able to do it.

That being said, I don't know how that would be done with just a regexp.
We hope your rules and wisdom choke you / Now we are one in everlasting peace
-- Radiohead, Exit Music (for a film)
     
Senior User
Join Date: Feb 2001
Location: Rochester, uk
Status: Offline
Reply With Quote
Jun 27, 2002, 04:11 AM
 
Isn't there an option for getting *everything* that matches the criteria? I forget the syntax, and don't even know what form of regex you're using, but i'm pretty sure Perl could do this.
All words are lies. Including these ones.
     
Professional Poster
Join Date: Dec 2000
Location: Chicago, Illinois
Status: Offline
Reply With Quote
Jun 27, 2002, 06:55 AM
 
I did do that for awhile (working with the list), but for large texts it can be somewhat expensive. I'll probably end up doing that again.

Thanks,
F-bacher
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 09:40 PM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2