Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Applications > Complex Find and Replace BBEdit foolishness

Complex Find and Replace BBEdit foolishness
Thread Tools
Grizzled Veteran
Join Date: Oct 2002
Status: Offline
Reply With Quote
Mar 17, 2005, 08:27 AM
 
So, I have 760 HTML documents, all of which look a little something like this:

Code:
<!--#include virtual="title.ssi" -->Livewires No. 1<!--#include virtual="top.ssi" --> <div class="title"> <a href="../titles/lvwr_zzz.html">Livewires</a><br> </div> <br> <br> No. 1: Part 1 (of 6), "See These Eyes So Red"<br> <br> Writing: Adam Warren<br> Pencils: Rick Mays<br> Inks: Jason Martin<br> Colours: Guru eFX<br> Letters: Junemoon Studios<br> Cover: Adam Warren<br> Assistant Editors: Andy Schmidt, Nicole Wiley and Molly Lazer<br> Editor: Tom Brevoort<br> <br> Marvel Comics<br> <br> 32 Pages (Story: 22)<br> <br> Full Colour<br> <br> $2.99 USA / $4.25 CAN<br> <br> April 2005<br> <br> <br> <div class="subtitle"> Synopsis<br> </div> <!--#include virtual="synopsis.ssi" --> <!--#include virtual="middle.ssi" --> <img width="120" height="185" border="1" src="../covers/lvwr_001.jpg" alt="Cover Image"><br> <!--#include virtual="bottom.ssi" -->
As you can see, at the top of the code, I have the title Livewires No. 1. And towards the bottom of the page, I have alt="Cover Image".

What I'd like to do is have BBEdit find the title part and copy it to the alt tag part. In other words, overwrite alt="Cover Image" with alt="Livewires No. 1".

I know that searching for title.ssi" -->.+<!--#include, with Grep turned on, will find the chunk with the title in it. And searching for alt="Cover Image" is simplicity itself. What I do not know is how to tell BBEdit to isolate the title part and then plug that into the alt tag part. Can anyone help me?
BayBook (13" MacBook Pro, 2.4GHz Core 2 Duo, 4GB RAM, 1TB HD) // BayPhone (iPhone 4, 32GB, black)
     
Senior User
Join Date: Sep 2002
Location: Canastota, New York
Status: Offline
Reply With Quote
Mar 17, 2005, 09:08 AM
 
Try using $1 in the replace field.

I'm not at a Mac right now, so I can't confirm that this will work.
     
Professional Poster
Join Date: Oct 1999
Location: :ИOITAↃO⅃
Status: Offline
Reply With Quote
Mar 17, 2005, 09:22 AM
 
It's time you learned a little Perl.
Here's a one-liner that (I think) does what you want. Duplicate the folder, then try this in Terminal (in the duplicate):

perl -i.bak -pe 'if (m/"title.ssi" -->(.+)<!--/) { $title = $1; print STDERR "Processed: $title\n" } s/alt="Cover Image"/alt="$title"/g;' *.html

The -i option tells Perl to do an in-place substitution of the file, saving the original to <original-file>.bak

The -p option tells Perl to print each line of the input file to the output file (after making any changes, by running the script against each line of the input).

The -e option tell Perl to execute the script that follows.
The script is run against each line of the input. The first part looks for the title regex, and if it finds it, saves the result ($1) to the variable $title. It also prints to STDERR so you can follow its progress. The second part, s///, substitutes any occurance (in each line) of the text 'alt="Cover Image"' with 'alt="$title"', where $title has (hopefully) already been set.

The *.html tells the shell to run this command for every file ending in *.html.

If you're satisfied that it worked okay, you can then
rm *.bak
to get rid of the backup files.
     
megasad  (op)
Grizzled Veteran
Join Date: Oct 2002
Status: Offline
Reply With Quote
Mar 17, 2005, 06:27 PM
 
Originally posted by galarneau:
Try using $1 in the replace field.
I'm not at a Mac right now, so I can't confirm that this will work.
This was too vague for me. How do I tell BBEdit that I want the title to become $1 and then to use that to fill the alt tag?

Originally posted by Mithras:
It's time you learned a little Perl...
You are powerful! That there code, (minus the " between the m and title.ssi (on account of the actual code being ever so slightly different from the code I posted, altered to make these pages not too wide)) worked a charm. If you feel the urge, you can see my website, with shiny new alt tags and valid XHTML even, at comics.megasad.com.

[EDIT - If the alt tag any has kind of "special" characters in it, Safari won't display the alt text when an image doesn't load. As far as I can tell, these special characterrs include colons, semi-colons and periods, and since a lot of the comic books on my website have those characters in the titles, that means no alt tags for Safari users when the image doesn't load. But it works fine in every other browser, and is valid XHTML, so hopefully Safari will catch up one day soon.]
(Last edited by megasad; Mar 17, 2005 at 07:32 PM. )
BayBook (13" MacBook Pro, 2.4GHz Core 2 Duo, 4GB RAM, 1TB HD) // BayPhone (iPhone 4, 32GB, black)
     
Professional Poster
Join Date: Oct 1999
Location: :ИOITAↃO⅃
Status: Offline
Reply With Quote
Mar 18, 2005, 06:24 AM
 
Cool. I like to see a site with an obsession, and yours has that in spades.
You can always add in a
$title =~ s/[:;\.]/-/g;
right after the 'print "Processed"' command, to replace those 'special characters' with a dash.
     
megasad  (op)
Grizzled Veteran
Join Date: Oct 2002
Status: Offline
Reply With Quote
Mar 18, 2005, 02:39 PM
 
Originally posted by Mithras:
Cool. I like to see a site with an obsession, and yours has that in spades.
You can always add in a
$title =~ s/[:;\.]/-/g;
right after the 'print "Processed"' command, to replace those 'special characters' with a dash.
Thank you. For now I shall not use that code; it seems silly to "fix" something so that it will work in Safari when nothing is broken in the first place.

Now I write lots about something that has nothing to do with BBEdit.

Whilst I was testing my site in every other browser (Camino, Firefox, iCab, Internet Explorer, Mozilla, Netscape, OmniWeb and Opera in OS X, Internet Explorer 6.0 in Windows 98) I found that they all seem to render alt text quite differently from each other.

OmniWeb is the prettiest, the text fitting within a shaded box the same size as the image and small enough to read easily, but cut off with an ellipses. iCab and Opera do the same, only not as pretty. Internet Explorer for Mac centres the alt text within the image box, cutting off the beginning and end, so it is quite useless for the comics with long titles. The Mozila based browsers all do some weird thing where they put a border around each line of the alt text, which is damn ugly, but at least you can read the full title. IE for Windows wraps the text within the image placeholder box, that you can read it all, and so does Safari, but only when it feels like it.

After more investigation, it doesn't even seem to be special characters that are the problem for Safari; a lot of the time it just doesn't seem to want to display alt text when an image does not load. Which is when you kind of need it to... Oh well.
BayBook (13" MacBook Pro, 2.4GHz Core 2 Duo, 4GB RAM, 1TB HD) // BayPhone (iPhone 4, 32GB, black)
     
megasad  (op)
Grizzled Veteran
Join Date: Oct 2002
Status: Offline
Reply With Quote
Aug 25, 2005, 03:49 AM
 
Hokey doke.

I've got me a new problem and I was hoping someone could help.

Currently, I've got a folder of 1007 HTML documents that look a little something like this:

Code:
<!--#include virtual="/resources/includes/title.ssi" -->The Authority: More Kev 1<!--#include virtual="/resources/includes/top.ssi" --> xxx <!--#include virtual="/resources/includes/middle.ssi" --> <div class="title"> <a href="../titles/authority.html">The Authority</a><br /> </div> <br /> <br /> More Kev 1: The Wonderful Thing About Tiggers Part One<br /> <br /> Writing: Garth Ennis<br /> Art: Glenn Fabry<br /> Colours: David Baron<br /> Letters: Phil Balsman<br /> Cover: Glenn Fabry<br /> Assistant Editor: Kristy Quinn<br /> Editor: Ben Abernathy<br /> <br /> WildStorm Productions<br /> <br /> 32 Pages (Story: 22)<br /> <br /> Full Colour<br /> <br /> $2.95 USA / $4.50 CAN<br /> <br /> July 2004<br /> <br /> <br /> <!--#include virtual="/resources/includes/comments.ssi" --> yyy<a href="../largecovers/authority-morekev1.jpg"><img width="160" height="245" src="../smallcovers/authority-morekev1.jpg" alt="The Authority: More Kev 1" /></a><br /> <!--#include virtual="/resources/includes/bottom.ssi" -->
All I want to do is take yyy.+ and copy it up to where xxx is.

The following line of code very nearly does what I want it to:

Code:
perl -i.bak -pe 'if (m/yyy(.+)/) { $cover = $1; print STDERR "Processed: $cover\n" } s/xxx/$cover/g;' *.html
However, I think because of the fact that the instance of yyy.+ comes after xxx in each file, what happens is that the yyy.+ from one file is copied into the next file. So, the very first file has no line where xxx was at all, and every subsequent file has the yyy.+ line from the file before it. All of which means that every page has the wrong cover image.

Does anyone know how I can change the Perl code, that it will search from the bottom of the file up, rather than the top of the file down? Thank you if you do.
BayBook (13" MacBook Pro, 2.4GHz Core 2 Duo, 4GB RAM, 1TB HD) // BayPhone (iPhone 4, 32GB, black)
     
megasad  (op)
Grizzled Veteran
Join Date: Oct 2002
Status: Offline
Reply With Quote
Aug 26, 2005, 09:34 AM
 
Originally Posted by megasad
...Does anyone know how I can change the Perl code, that it will search from the bottom of the file up, rather than the top of the file down?
Seeing as it doesn't seem possible to do what I want using either BBEdit or Perl, I've developed a work-around of my own:

1 - Use this Perl to replace all instances of xxx with yyy.+:
Code:
perl -i.bak -pe 'if (m/yyy(.+)/) { $cover = $1; print STDERR "Processed: $cover\n" } s/xxx/zzz$cover/g;' *.html
Each file has the previous file's yyy.+ in it, on account of the problem stated in my previous post.

2 - Use BBEdit to search for all instances of zzz and replace them with ppp\rzzz.

3 - Using Renamer4Mac, sort the files descending alphabetically and then number them with a three digit number.

4 - Use the following code to take a file's zzz.+ line and put into the "next" file's ppp reference:
Code:
perl -i.bak -pe 'if (m/zzz(.+)/) { $cover = $1; print STDERR "Processed: $cover\n" } s/ppp/qqq$cover/g;' *.html
Because the files have had numbers put in front of their filenames, they are now processed in the reverse order to that in step 1, setting things right.

5 - Using BBEdit, delete all ppp.+, zzz.+, yyy.+ and qqq references.

6 - Using Renamer4Mac again, delete the first three characters of every file's filename.

7 - Fix the very first and last file by hand.

A little convoluted, to say the least, but it worked. See for yourself.
BayBook (13" MacBook Pro, 2.4GHz Core 2 Duo, 4GB RAM, 1TB HD) // BayPhone (iPhone 4, 32GB, black)
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 05:00 PM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2