Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > Mac OS X > Best way to bulk remove blank lines in a doc?

Best way to bulk remove blank lines in a doc?
Thread Tools
Senior User
Join Date: Sep 2002
Location: Canastota, New York
Status: Offline
Reply With Quote
Jul 20, 2003, 12:23 AM
 
I have a bunch of plain text documents that are practice questions for an exam. We're talking thousands of questions here. After each question, there are a series of 5 choices:


A. Blah Blah Blah

B. Blah Blah Blah

...

E. Blah Blah Blah


Between each choice, there is a line break. What I'm trying to figure out is the easiest way to remove the line breaks between these choices.

I'm guessing my options are Applescript, Perl, or maybe a shell script.

I can outline (in English) what I want the script to do:

If line begins with "A. ", "B. ", "C. ", "D. ", or "E. " then delete next line

Simple, no? I just have no idea where to start with this.

I'd appreciate any opinions on the best/most logic tool for the job, and perhaps a snippet of code or two. In the meantime, I really should get back to studying :-(

Thanks a bunch
     
Dedicated MacNNer
Join Date: Dec 2002
Location: someplace
Status: Offline
Reply With Quote
Jul 20, 2003, 01:22 AM
 
BBEdit Lite can do it. Just run a multiple file search & replace: replace \r\r with \r.

BBEdit Lite is no longer available from BareBones, but it is still available elsewhere on the 'net:
http://fileavenue.com/index.php?q=bb....search=Search
     
Senior User
Join Date: Sep 2002
Location: Canastota, New York
Status: Offline
Reply With Quote
Jul 20, 2003, 01:35 AM
 
Unfortunately, it's not that simple. While you are correct in that it would remove the empty lines between the answer choices, it would also remove any other blank lines, which messes up the question and answer explanation formating.

Thanks for the input though
     
Mac Enthusiast
Join Date: Nov 2001
Location: Adelaide, South Australia
Status: Offline
Reply With Quote
Jul 20, 2003, 02:55 AM
 
Give this a go:

perl -pi.bak -e '$a=<> if /^[A-Z]\./' filename

where filename is your set of questions. Original is in filename.bak should it all go horribly wrong!

Cheers,
Paul
     
Addicted to MacNN
Join Date: Jun 1999
Location: Las Vegas, NV, USA
Status: Offline
Reply With Quote
Jul 20, 2003, 08:28 AM
 
Are you sure you want the line after E. to be gone too?

Just do a simple search and replace as previously suggested, but instead of replacing /r/r with /r, replace /r/rB. with /rB. and do the same for C. D. and E.

It will be harder to delete the line after E. (again, are you sure you want this one gone?) but if each question starts with a number, you just search for /r/r[0-9].

Chris
     
Senior User
Join Date: Sep 2002
Location: Canastota, New York
Status: Offline
Reply With Quote
Jul 20, 2003, 08:42 AM
 
Hey Paul,

Once again you come through. Works like a charm. I'm still using that eBroadcast.com TV guide extraction script you made last year.

Guess it's time to learn those regular expressions or whatever they're called.

Thanks again bubba,
-J
     
Mac Enthusiast
Join Date: Nov 2001
Location: Adelaide, South Australia
Status: Offline
Reply With Quote
Jul 20, 2003, 10:12 PM
 

Guess it's time to learn those regular expressions or whatever they're called.
Learning regexes will always hold you in good stead, whether you end up applying them in grep, awk, perl, python, sed or any of the myriad other apps that now embed the capability to munge text in this way.

Your problem was rather attractive in that it let me use a nice trick that had been waiting for an application. the "$a=<>" piece just throws away the line after the one that matches the regular expression (ie after any line beginning with a capital letter and then a literal period). Not very defensive, but given that you'd guaranteed the next line to be blank I didn't think it was worth checking!

(Oh yeah: You're welcome)

Cheers,
Paul
     
Addicted to MacNN
Join Date: Jun 1999
Location: Las Vegas, NV, USA
Status: Offline
Reply With Quote
Jul 21, 2003, 12:43 AM
 
How does $a=<> mean "the line after the one I just found. I think I see that it replaces the line with a null, but what is $a?

Chris
     
Mac Enthusiast
Join Date: Jun 2000
Location: New Jersey, USA
Status: Offline
Reply With Quote
Jul 21, 2003, 05:06 PM
 
The -p option tells perl to read an input line (assigning value to variable $_), then execute the given program, then print the variable $_. Without any alteration, that will simply print the input line. Do this for all lines in the input files.

So Paul's script basically says:

For each input line, see if it starts with a capital letter and a dot. If so, read the next input line into a garbage variable $a (which will be discarded). Then print the original input line.

If you only wanted to delete the lines between the options (and not the one after E.) you could replace A-Z with A-D.
     
Addicted to MacNN
Join Date: Jun 1999
Location: Las Vegas, NV, USA
Status: Offline
Reply With Quote
Jul 21, 2003, 06:24 PM
 
Thanks for the explanation. That was interesting.

Chris
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 08:19 PM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2