 |
 |
subtracting one text list from another
|
 |
|
 |
|
Junior Member
Join Date: May 2000
Location: Hollywood, CA, USA
Status:
Offline
|
|
I am cleaning some mailing lists that are simple text files with carriage returns. I would like to subtract an old list from a newer larger list which contains many items in the old list. I don't want to remove duplicates only, I want to remove B from A so that instances that are in BOTH A and B are removed. Can anyone think of a way to do that?
Thanks very much.
|
|
|
| |
|
|
|
 |
|
 |
|
Professional Poster
Join Date: Dec 2001
Location: somewhere
Status:
Offline
|
|
Depends on the available tools. For me, the simplest way (because I have and am familiar with the tools) would be to pull it into a SQL table and then select the distinct set of records back out. There's probably a way to put it in Excel, sort and do something to eliminate duplicates (a really crude way might be to use subtotals, then collapse the view to the subtotals, then copy & paste it out). Someone can likely give you a simple shell script, but that's not something I'm familiar with.
|
|
|
| |
|
|
|
 |
|
 |
|
Clinically Insane
Join Date: Oct 2001
Location: San Diego, CA, USA
Status:
Offline
|
|
Ruby one-liner:
Code:
puts ARGV.map {|f| File.read(f).split}.reduce {|a,b| a-b}.join("\n")
|
|
Chuck
___
"Instead of either 'multi-talented' or 'multitalented' use 'bisexual'."
|
| |
|
|
|
 |
|
 |
|
Clinically Insane
Join Date: Mar 2001
Location: yes
Status:
Offline
|
|
I would bet you could do something along the lines of:
cat file1 file2 | uniq > newfile
|
|
|
| |
|
|
|
 |
|
 |
|
Clinically Insane
Join Date: Oct 2001
Location: San Diego, CA, USA
Status:
Offline
|
|
Uniq only works on sorted data. You could run it through sort first, but then of course you would lose the ordering of the original file. I don't know whether that's OK in this particular case. (Also, it needs to be uniq -u or else it will add unique items from file2 to file1 rather than subtracting it.)
|
|
Chuck
___
"Instead of either 'multi-talented' or 'multitalented' use 'bisexual'."
|
| |
|
|
|
 |
|
 |
|
Clinically Insane
Join Date: Mar 2001
Location: yes
Status:
Offline
|
|
Also, if the original poster is using proper mailing list software such as Mailman (and he should be if his list is large enough), it will prevent double email address subscriptions anyway.
|
|
|
| |
|
|
|
 |
|
 |
|
Junior Member
Join Date: Aug 2005
Status:
Offline
|
|
Chuckit, I'm pretty sure uniq -u is the same as uniq. Are you thinking of sort -u? "cat file1 file2 | sort -u" will give you all lines that are only in one file, and only there once. I'm also a little confused about adding and subtracting, assuming the older list is a subset of the newer one, the result will be those lines which only appear in the newer list (assuming the new list has no duplicate entries).
|
|
|
| |
|
|
|
 |
|
 |
|
Moderator 
Join Date: Apr 2001
Location: Wasilla, Alaska
Status:
Offline
|
|
|
|
|
|
| |
|
|
|
 |
|
 |
|
Clinically Insane
Join Date: Oct 2001
Location: San Diego, CA, USA
Status:
Offline
|
|
Originally Posted by davidbk1
Chuckit, I'm pretty sure uniq -u is the same as uniq.
Nope. It normally deletes occurrences after the first, but with the -u flag, it deletes the first as well. Plain printf "a\na\nb\n" | uniq prints "a b", whereas printf "a\na\nb\n" | uniq -u will just print "b".
|
|
Chuck
___
"Instead of either 'multi-talented' or 'multitalented' use 'bisexual'."
|
| |
|
|
|
 |
|
 |
|
Junior Member
Join Date: Aug 2005
Status:
Offline
|
|
Thanks Chuckit, the man page really doesn't do a good job of explaining that.
I like how only one person went to the utility specifically designed for this (diff). My first thought was definitely uniq as well.
|
|
|
| |
|
|
|
 |
 |
|
 |
|
|
|
|
|

|
|
 |
Forum Rules
|
 |
 |
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
 |
 |
 |
 |
|
 |
|