 |
 |
Mail.app: matching junk using regular expressions - my solution
|
 |
|
 |
|
Senior User
Join Date: Nov 2002
Location: US
Status:
Offline
|
|
(Edit: new thread
http://forums.macnn.com/showthread.p...hreadid=188887
points to the site for the tool described here)
Read this if you
1. are annoyed that spammers nowadays intentionally garble their messages to defeat Mail.app's junk filter (e.g., viagra becomes vi!agra)
2. are going nuts that some spams come in HTML in which keywords are encoded in HTML entities, such as ' ' for the space character
My solution involves an AppleScript script, a Python script, and 3 configuration (text) files containing patterns you want to match (one for subject line, one for sender line, and one for message raw source text). Everything required is built-in in Panther.
I'll post the 5 files in the following message, but here are the steps to follow:
1. Create a folder: /Users/<yourname>/Library/Scripts
2. Dump all the 5 files you copy-n-pasted from my next 5 posts in that directory
3. in Mail.app, open Preferences > Rules
4. Add a rule to the end of your rule list, call it JunkMatcher (or whatever): let the rule match "every message", and set the "performing the following actions:" to "Run AppleScript", and then choose /Users/<yourname>/Library/Scripts/junkMatcher.scpt.
5. Click ok to add the rule, and you should be all set.
What does this do?
Whenever a new message comes in, if its subject/sender/raw content matches ANY of the patterns specified in junkSubj.txt/junkSender.txt/junkContent.txt, the message is moved to the junk folder. Note this is DISJUNCTION - any pattern match would make the message junk!
(Last edited by fortepianissimo; Nov 15, 2003 at 09:12 PM.
)
|
|
|
| |
|
|
|
 |
|
 |
|
Senior User
Join Date: Nov 2002
Location: US
Status:
Offline
|
|
junkMatcher.scpt
Use Script Editor to create the file by copy-n-paste:
Code:
using terms from application "Mail"
on perform mail action with messages theMessages for rule theRule
tell application "Mail"
repeat with theMsg in theMessages
set theSubject to subject of theMsg
set theSender to sender of theMsg
set theContent to source of theMsg
set result to do shell script "python ~/Library/Scripts/junkMatcher.py " & quoted form of theSubject & ¬
" " & quoted form of theSender & " " & quoted form of theContent
if result is equal to "yes" then
move theMsg to the junk mailbox
end if
end repeat
end tell
end perform mail action with messages
end using terms from
(Last edited by fortepianissimo; Nov 14, 2003 at 01:46 PM.
)
|
|
|
| |
|
|
|
 |
|
 |
|
Senior User
Join Date: Nov 2002
Location: US
Status:
Offline
|
|
junkMatcher.py
use any text editor to create this file
Code:
#!/usr/bin/env python
import re,sys,os
ROOT=os.environ["HOME"]+'/Library/Scripts/'
entityPat=re.compile(r'&#\d+;')
content=sys.argv[3].replace('\n',' ')
idx=0
while 1:
m=entityPat.search(content,idx)
if m is None: break
code=int(content[m.start(0)+2:m.end(0)-1])
if code<256 and code>=0:
content=content[:m.start(0)]+chr(code)+content[m.end(0):]
idx=m.start(0)+1
else: idx=m.end(0)
#print content
def makePat (fn):
patStr='|'.join([line.strip()[1:-1] for line in open(ROOT+fn,'r').xreadlines()])
if len(patStr)==0: return None
else: return re.compile(patStr)
junkSubjPat=makePat('junkSubj.txt')
junkSenderPat=makePat('junkSender.txt')
junkContentPat=makePat('junkContent.txt')
if ((junkSubjPat and junkSubjPat.search(sys.argv[1])) or
(junkSenderPat and junkSenderPat.search(sys.argv[2])) or
(junkContentPat and junkContentPat.search(content))): print 'yes'
else: print 'no'
(Last edited by fortepianissimo; Nov 14, 2003 at 01:26 PM.
)
|
|
|
| |
|
|
|
 |
|
 |
|
Senior User
Join Date: Nov 2002
Location: US
Status:
Offline
|
|
junkSubj.txt
use any text editor to create this file - this is a list of regex patterns, each of which must be on a line, surrounded by a pair of ", and the pattern must be specified in Python formalism (see, for example http://www.amk.ca/python/howto/regex/ , you can find other places detailing this).
Code:
"(?i)v\W?i\W?a\W?g\W?r\W?a"
"(?i)p\W?e\W?n\W?i\W?s"
"(?i)prescription"
(Last edited by fortepianissimo; Nov 14, 2003 at 01:26 PM.
)
|
|
|
| |
|
|
|
 |
|
 |
|
Senior User
Join Date: Nov 2002
Location: US
Status:
Offline
|
|
junkSender.txt
use any text editor to create this file - this is a list of regex patterns, each of which must be on a line, surrounded by a pair of ", and the pattern must be specified in Python formalism (see, for example http://www.amk.ca/python/howto/regex/ , you can find other places detailing this).
(that's right, so far I haven't tried to match against senders)
(Last edited by fortepianissimo; Nov 14, 2003 at 01:27 PM.
)
|
|
|
| |
|
|
|
 |
|
 |
|
Senior User
Join Date: Nov 2002
Location: US
Status:
Offline
|
|
junkContent.txt
use any text editor to create this file - this is a list of regex patterns, each of which must be on a line, surrounded by a pair of ", and the pattern must be specified in Python formalism (see, for example http://www.amk.ca/python/howto/regex/ , you can find other places detailing this).
Code:
"(?i)v\W?i\W?a\W?g\W?r\W?a"
"(?i)p\W?e\W?n\W?i\W?s"
"(?i)prescription"
"(?i)manhood"
(Last edited by fortepianissimo; Nov 14, 2003 at 01:27 PM.
)
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Mar 2001
Location: Provo, UT
Status:
Offline
|
|
You should send this hint to OSXHints.
|
|
|
| |
|
|
|
 |
|
 |
|
Senior User
Join Date: Nov 2002
Location: US
Status:
Offline
|
|
These are new patterns I added to junkContent.txt
Code:
"(?i)<\s*i(?:=\s*)?m(?:=\s*)?g(?:=\s*)?[^>]+(?:l(?:=\s*)?o(?:=\s*)?w(?:=\s*)?)?s(?:=\s*)?r(?:=\s*)?c(?:=\s*)?\s*(?:=|=3d)\s*(?:'|")\s*h(?:=\s*)?t(?:=\s*)?t(?:=\s*)?p(?:=\s*)?:"
"(?i)pill(?:s)?"
The first pattern will throw any message referring to an image via http into junk folder.
The 2nd is an obvious addition, and can be added to junkSubj.txt as well.
Junk mails, die die die!
(edit: updated pattern for external image, also removed "microsoft" pattern - too aggressive)
(edit: updated img patter - more coverage)
(Last edited by fortepianissimo; Nov 15, 2003 at 02:33 PM.
)
|
|
|
| |
|
|
|
 |
|
 |
|
Senior User
Join Date: Nov 2002
Location: US
Status:
Offline
|
|
Originally posted by clarkgoble:
You should send this hint to OSXHints.
Done that - thx for the suggestion.
My secret hope is, once everyone is using this, those son of b*tches will find their messages are all but down in the drain.
Of course, my 2nd secret hope, is that everyone not using Mac will still get loads of spams as before. 
|
|
|
| |
|
|
|
 |
 |
|
 |
|
|
|
|
|

|
|
 |
Forum Rules
|
 |
 |
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
 |
 |
 |
 |
|
 |
|