 |
 |
How to strip html down to bare minimum
|
 |
|
 |
|
Senior User
Join Date: Mar 2002
Location: Chicago, IL
Status:
Offline
|
|
I'm working on converting a site from old, HTML 3.2 style coding to XHTML 1, and I have about 100 pages that need to be converted. I have created a template for the new site, and now all I need to do is strip all teh presentational markup from teh old site files so I can copy/paste the content into new ones.
That is what I'm having trouble with. I've gotten close with HTML Tidy, but not close enough. I've gotten it to strip out the font tags, convert tags to lowercase, but it still leaves bold tags behind, and there are a lot of them.
What I want to do is basically be able to process a text file, and have everything except paragraph and table tags removed. I'm wondering if anyone has any good ways of doing this, because I sure don't think that going through and editting by hand is a great use of my time, or anyone elses.
|
We need less Democrats and Republicans, and more people that think for themselves.
infinite expanse
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Mar 2002
Location: Clogland
Status:
Offline
|
|
Most text editors have a "search and replace" function, simply search for "<B>" and replace with "", etc.
(not sure what to use on a Mac platform *ducks and runs for cover*)
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Mar 2003
Status:
Offline
|
|
BBEdit. It Doesn't Suck (r)
http://www.barebones.com/products/bbedit/index.shtml
It has a search and replace feature that works over a bunch of pages within a folder.
Also it has a function to remove markup. However $179 is a little much to spend on a text editor and IDE but well worth it instead of dreamweaver for HTML and I am not sure if free version or even their text wrangler ($49) does the features you need.
It does a good job of removing markup but It skips over javascript.
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Mar 2002
Location: Clogland
Status:
Offline
|
|
Metapad is excellent freeware if you're on a windows box, btw.
|
|
|
| |
|
|
|
 |
|
 |
|
Grizzled Veteran
Join Date: Nov 2003
Location: Hebburn, UK
Status:
Offline
|
|
Originally posted by Truepop:
It does a good job of removing markup but It skips over javascript.
On a mildly related note, I was always annoyed that BBEdit didn't colour my javascript when it was embedded in an (X)HTML page. However, the other day I decided to make an XHTML page validate. I added type="text/javascript" and language="javascript" to the script tag and BBEdit now colours the code, which is nice  .
Anyhoo - I'm not sure which of the two attributes BBedit was looking for in order to recognise the code as javascript, I presume it was language="javascript". It is most helpful, anyway.
So, my point is - I wonder if BBEdit is skipping over javascript for the same reason... (I'm presuming you don't want it to skip over it)
|
|
|
| |
|
|
|
 |
|
 |
|
Grizzled Veteran
Join Date: Jun 2001
Location: Melbourne, Australia
Status:
Offline
|
|
Are you running PHP? It would be trivial to write a script to traverse a folder and remove all tags (within the BODY tags) except for some nominated tags such as P, TABLE, TR, TD, etc...
|
|
Computer thez nohhh...
|
| |
|
|
|
 |
|
 |
|
Dedicated MacNNer
Join Date: Aug 2002
Status:
Offline
|
|
That sounds like a job Perl was made for. Unfortunately, I do not know it well enough to give you a solution. Maybe you should ask this question in the Unix forum.
|
|
|
| |
|
|
|
 |
|
 |
|
Occasionally Useful
Join Date: Jun 2001
Location: Liverpool, UK
Status:
Offline
|
|
|
|
|
"Have sharp knives. Be creative. Cook to music" ~ maxelson
|
| |
|
|
|
 |
 |
|
 |
|
|
|
|
|

|
|
 |
Forum Rules
|
 |
 |
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
 |
 |
 |
 |
|
 |
|