Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Developer Center > Regex Question

Regex Question
Thread Tools
Grizzled Veteran
Join Date: Jun 2002
Status: Offline
Reply With Quote
Sep 16, 2004, 03:12 PM
 
Hey guys,

Say I have a document that looks like this:

Code:
this is text<hello>text</hello>
What I would like to do with regex, is be able to find anything outside of the < and >. So basically, I'd like to get find every piece of text outside of the <> tags.

Any regex string appreciated

Oliver
     
qyn
Dedicated MacNNer
Join Date: Dec 2000
Location: sj ca
Status: Offline
Reply With Quote
Sep 17, 2004, 05:56 AM
 
In perl, you can do something like this:

Code:
$text = "your full <html> text"; @chunks = split(/<.*?>/, $text);
Then @chunks will be an array of all text chunks in the document. In this case, "your full " and " text".

There's many other ways to do it of course, but it all depends on what you're planning on doing with the text. What are you're planning to do with the text?
     
Forum Regular
Join Date: Jan 2001
Status: Offline
Reply With Quote
Sep 17, 2004, 06:07 AM
 
In Java you could do something like this

Code:
public class RegexTest { public static void main(String[] args) { StringBuffer buf = new StringBuffer("this is text<hello>text</hello>"); StringBuffer parsed; String exp = "(<[/]*\\s*)(.*?)(\\s*>)"; Pattern p = Pattern.compile(exp); Matcher m = p.matcher(buf); /*replace matches with " "*/ parsed = new StringBuffer(m.replaceAll(" ")); System.out.println(parsed); } }
I have not tried the code above (but it should work ), Im not at home but I try to night and edit the code if it dont work as expected.

this peice of code will print "this is text text "

edit small typo.
(Last edited by geran; Sep 17, 2004 at 11:21 AM. )
     
Mac Elite
Join Date: Sep 2000
Location: Tempe, AZ
Status: Offline
Reply With Quote
Sep 17, 2004, 10:29 AM
 
If you're using regex.h in code, you can just search for the html tags instead. You'll get the index into the string at which the match is found, and the length of the match. From that, you can just grab the portion of the string that wasn't matched.

Match an html tag using this:
<[^>]+>[^<]+<[^>]+>

Depending on the options you use in regex, that may or may not catch tags that span multiple lines.
Geekspiff - generating spiffdiddlee software since before you began paying attention.
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 01:15 PM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2