 |
 |
YAREQ (Yet Another Regular Expression Question)
|
 |
|
 |
|
Professional Poster
Join Date: Dec 2000
Location: Chicago, Illinois
Status:
Offline
|
|
So here's my question. I need to take a text file and pare out all the words. Using regular expressions, that's pretty easy:
</font><blockquote><font size="1" face="Geneva, Verdana, Arial, sans-serif">code:</font><hr /><pre style="font-size:x-small; font-family: monospace;">"[a-zA-Z-]+"</pre><hr /></blockquote><font size="1" face="Geneva, Verdana, Arial, sans-serif">Now what I want to do is find every pair of words. So for the phrase: "I am a pretty little girl", I need to be able to get 'I am', 'am a', 'a pretty', 'pretty little', little girl'. I tried do this:
</font><blockquote><font size="1" face="Geneva, Verdana, Arial, sans-serif">code:</font><hr /><pre style="font-size:x-small; font-family: monospace;">"[a-zA-Z-]+[^a-zA-Z-]+[a-zA-Z-]+"</pre><hr /></blockquote><font size="1" face="Geneva, Verdana, Arial, sans-serif">This sort of works, except that I end up getting 'I am', 'a pretty', 'little girl'. Does anyone know how I can setup the regular expression correctly to get the result I want? It has something to do with rewind marks, but I'm not much of a regular expression buff.
Thanks!
F-bacher
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Sep 2000
Location: Tempe, AZ
Status:
Offline
|
|
I didn't think you could rewind regular expressions. BBEdit's got nice documentation for 'em in their help if you've got it. Or you could 'man regexp' in the terminal.
|
Geekspiff - generating spiffdiddlee software since before you began paying attention.
|
| |
|
|
|
 |
|
 |
|
Grizzled Veteran
Join Date: Sep 2000
Location: Springfield, MA
Status:
Offline
|
|
At the risk of sounding terribly ignorant, is this really necessairy?
I mean, once you have a list of the words, it's trivial to generate the word pairs. Maybe for some reason this won't work in your situation, but I'd wager that the code would be more maintenable than some messy regexp if you were able to do it.
That being said, I don't know how that would be done with just a regexp.
|
|
We hope your rules and wisdom choke you / Now we are one in everlasting peace
-- Radiohead, Exit Music (for a film)
|
| |
|
|
|
 |
|
 |
|
Senior User
Join Date: Feb 2001
Location: Rochester, uk
Status:
Offline
|
|
Isn't there an option for getting *everything* that matches the criteria? I forget the syntax, and don't even know what form of regex you're using, but i'm pretty sure Perl could do this.
|
|
All words are lies. Including these ones.
|
| |
|
|
|
 |
|
 |
|
Professional Poster
Join Date: Dec 2000
Location: Chicago, Illinois
Status:
Offline
|
|
I did do that for awhile (working with the list), but for large texts it can be somewhat expensive. I'll probably end up doing that again.
Thanks,
F-bacher
|
|
|
| |
|
|
|
 |
 |
|
 |
|
|
|
|
|

|
|
 |
Forum Rules
|
 |
 |
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
 |
 |
 |
 |
|
 |
|