Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Community > MacNN Lounge > subtracting one text list from another

subtracting one text list from another
Thread Tools
Junior Member
Join Date: May 2000
Location: Hollywood, CA, USA
Status: Offline
Reply With Quote
Jul 27, 2009, 04:42 PM
 
I am cleaning some mailing lists that are simple text files with carriage returns. I would like to subtract an old list from a newer larger list which contains many items in the old list. I don't want to remove duplicates only, I want to remove B from A so that instances that are in BOTH A and B are removed. Can anyone think of a way to do that?

Thanks very much.
     
Professional Poster
Join Date: Dec 2001
Location: somewhere
Status: Offline
Reply With Quote
Jul 27, 2009, 06:55 PM
 
Depends on the available tools. For me, the simplest way (because I have and am familiar with the tools) would be to pull it into a SQL table and then select the distinct set of records back out. There's probably a way to put it in Excel, sort and do something to eliminate duplicates (a really crude way might be to use subtotals, then collapse the view to the subtotals, then copy & paste it out). Someone can likely give you a simple shell script, but that's not something I'm familiar with.
     
Clinically Insane
Join Date: Oct 2001
Location: San Diego, CA, USA
Status: Offline
Reply With Quote
Jul 27, 2009, 07:12 PM
 
Ruby one-liner:

Code:
puts ARGV.map {|f| File.read(f).split}.reduce {|a,b| a-b}.join("\n")
Chuck
___
"Instead of either 'multi-talented' or 'multitalented' use 'bisexual'."
     
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Jul 28, 2009, 03:01 AM
 
I would bet you could do something along the lines of:

cat file1 file2 | uniq > newfile
     
Clinically Insane
Join Date: Oct 2001
Location: San Diego, CA, USA
Status: Offline
Reply With Quote
Jul 28, 2009, 09:05 AM
 
Uniq only works on sorted data. You could run it through sort first, but then of course you would lose the ordering of the original file. I don't know whether that's OK in this particular case. (Also, it needs to be uniq -u or else it will add unique items from file2 to file1 rather than subtracting it.)
Chuck
___
"Instead of either 'multi-talented' or 'multitalented' use 'bisexual'."
     
Clinically Insane
Join Date: Mar 2001
Location: yes
Status: Offline
Reply With Quote
Jul 28, 2009, 11:06 AM
 
Also, if the original poster is using proper mailing list software such as Mailman (and he should be if his list is large enough), it will prevent double email address subscriptions anyway.
     
Junior Member
Join Date: Aug 2005
Status: Offline
Reply With Quote
Jul 28, 2009, 07:21 PM
 
Chuckit, I'm pretty sure uniq -u is the same as uniq. Are you thinking of sort -u? "cat file1 file2 | sort -u" will give you all lines that are only in one file, and only there once. I'm also a little confused about adding and subtracting, assuming the older list is a subset of the newer one, the result will be those lines which only appear in the newer list (assuming the new list has no duplicate entries).
     
Moderator
Join Date: Apr 2001
Location: Wasilla, Alaska
Status: Offline
Reply With Quote
Jul 28, 2009, 07:35 PM
 
What about diff?
     
Clinically Insane
Join Date: Oct 2001
Location: San Diego, CA, USA
Status: Offline
Reply With Quote
Jul 28, 2009, 08:14 PM
 
Originally Posted by davidbk1 View Post
Chuckit, I'm pretty sure uniq -u is the same as uniq.
Nope. It normally deletes occurrences after the first, but with the -u flag, it deletes the first as well. Plain printf "a\na\nb\n" | uniq prints "a b", whereas printf "a\na\nb\n" | uniq -u will just print "b".
Chuck
___
"Instead of either 'multi-talented' or 'multitalented' use 'bisexual'."
     
Junior Member
Join Date: Aug 2005
Status: Offline
Reply With Quote
Jul 28, 2009, 08:35 PM
 
Thanks Chuckit, the man page really doesn't do a good job of explaining that.

I like how only one person went to the utility specifically designed for this (diff). My first thought was definitely uniq as well.
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 09:32 AM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2