Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Developer Center > How to strip html down to bare minimum

How to strip html down to bare minimum
Thread Tools
Senior User
Join Date: Mar 2002
Location: Chicago, IL
Status: Offline
Reply With Quote
Jan 31, 2004, 11:06 PM
 
I'm working on converting a site from old, HTML 3.2 style coding to XHTML 1, and I have about 100 pages that need to be converted. I have created a template for the new site, and now all I need to do is strip all teh presentational markup from teh old site files so I can copy/paste the content into new ones.

That is what I'm having trouble with. I've gotten close with HTML Tidy, but not close enough. I've gotten it to strip out the font tags, convert tags to lowercase, but it still leaves bold tags behind, and there are a lot of them.

What I want to do is basically be able to process a text file, and have everything except paragraph and table tags removed. I'm wondering if anyone has any good ways of doing this, because I sure don't think that going through and editting by hand is a great use of my time, or anyone elses.
We need less Democrats and Republicans, and more people that think for themselves.

infinite expanse
     
Mac Elite
Join Date: Mar 2002
Location: Clogland
Status: Offline
Reply With Quote
Jan 31, 2004, 11:10 PM
 
Most text editors have a "search and replace" function, simply search for "<B>" and replace with "", etc.

(not sure what to use on a Mac platform *ducks and runs for cover*)
     
Mac Elite
Join Date: Mar 2003
Status: Offline
Reply With Quote
Jan 31, 2004, 11:42 PM
 
BBEdit. It Doesn't Suck (r)

http://www.barebones.com/products/bbedit/index.shtml

It has a search and replace feature that works over a bunch of pages within a folder.

Also it has a function to remove markup. However $179 is a little much to spend on a text editor and IDE but well worth it instead of dreamweaver for HTML and I am not sure if free version or even their text wrangler ($49) does the features you need.

It does a good job of removing markup but It skips over javascript.
     
Mac Elite
Join Date: Mar 2002
Location: Clogland
Status: Offline
Reply With Quote
Feb 1, 2004, 02:29 AM
 
Metapad is excellent freeware if you're on a windows box, btw.
     
Grizzled Veteran
Join Date: Nov 2003
Location: Hebburn, UK
Status: Offline
Reply With Quote
Feb 1, 2004, 07:27 AM
 
Originally posted by Truepop:

It does a good job of removing markup but It skips over javascript.
On a mildly related note, I was always annoyed that BBEdit didn't colour my javascript when it was embedded in an (X)HTML page. However, the other day I decided to make an XHTML page validate. I added type="text/javascript" and language="javascript" to the script tag and BBEdit now colours the code, which is nice .

Anyhoo - I'm not sure which of the two attributes BBedit was looking for in order to recognise the code as javascript, I presume it was language="javascript". It is most helpful, anyway.

So, my point is - I wonder if BBEdit is skipping over javascript for the same reason... (I'm presuming you don't want it to skip over it)
     
Grizzled Veteran
Join Date: Jun 2001
Location: Melbourne, Australia
Status: Offline
Reply With Quote
Feb 1, 2004, 03:15 PM
 
Are you running PHP? It would be trivial to write a script to traverse a folder and remove all tags (within the BODY tags) except for some nominated tags such as P, TABLE, TR, TD, etc...
Computer thez nohhh...
     
Dedicated MacNNer
Join Date: Aug 2002
Status: Offline
Reply With Quote
Feb 1, 2004, 11:54 PM
 
That sounds like a job Perl was made for. Unfortunately, I do not know it well enough to give you a solution. Maybe you should ask this question in the Unix forum.
     
Occasionally Useful
Join Date: Jun 2001
Location: Liverpool, UK
Status: Offline
Reply With Quote
Feb 2, 2004, 01:00 PM
 
is HTML Tidy any good to you?
"Have sharp knives. Be creative. Cook to music" ~ maxelson
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 12:55 PM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2