MFilter - DMail Spam Filtering

MFilter is a new feature in DMail 2.9. It is an extension to DSMTP, which allows administrators to filter incoming mail. With MFilter, you can block messages, modify them, or give them 'spam scores' which are passed on to the client.

All of this is controlled by a single Rule File, mfilter.rul, which is placed by the administrator in the DMail work path.


Contents


Creating the Rule File

Simply create a file called mfilter.rul in the DMail work path. Then, restart DSMTP by issuing a 'tellsmtp reload' command, or by using DMAdmin.


Syntax of the Rule File

There are 3 valid types of statement in a rule file:

  1. assignment
  2. action
  3. conditional expression

Assignment

Assignment statements are used to assign values to variables. The syntax is in the form:

$variable_name = "quoted string" [+ "quoted string"[+ $variable ...]]

For example:

	$foo = "bar"

Action

Actions get the MFilter feature to perform some action with the message. The action syntax is:

accept "reason" | bounce "reason" | drop "reason" | forward "user@domain" | then | setflag("flagname") | clearflag("flagname") | spamdetect(spam_score, "reason")

Conditional Expressions

Conditional expressions control the actions taken by MFilter, based on one or more comparisons. The conditional expression syntax is:

if (Conditional_Expression) [and (Conditional_Expression)...] action

The Conditional_Expression may be any pre defined function, e.g.:

or any numerical comparison, e.g.:

or a simple NOT operator, e.g.:

Note that calculations are not permitted, so lines()+10 would fail.


Miscellaneous

Line Continuation

Lines can be continued by ending the line in a '\' character.

Quoting Strings

All strings and header names should be within double quotes, sometimes you may get away without doing this, but we don't guarantee this will work in future. For example, use: exists("Supersedes") not exists(Supersedes); quotes can be escaped in the usual way, e.g. "This \"Word\" has quotes around it"

Assignments

Assignments are processed at compile time, variables DO NOT exist at run time. Don't think of this as a programming language, but rather as a list of rules that are processed with each incoming message. Real run-time variables only exist in the form of the ifflag("xxx") function and the setflag("xxx") action.

For example the following is NOT VALID, as the assignment is processed before the rules are run. The rejection would always read "big message"

$fred = "small message" if (lines()>100) then    $fred = "big message" (this will not work as expected) end if reject $fred

Actions & Commands

Actions

Builtin Functions


New TellSMTP Commands

tellsmtp mfilter_test d:\test.msg d:\test.rul

Function Descriptions

allmod("header")

This returns true if all the newsgroups in the specified header are moderated.

exists("header")

This is true if the header exists in the message and is non zero in length, e.g. if (exists("supersedes")) then reject "We don't like supersedes headers"

head_len("header")

Returns the length of the named header, e.g. if (head_len("date")>60) bounce "Naughty message"

isbase64()

This is true if the message appears to contain base64 binary encoded data.

isbinary()

This is true if the message has binary data, either base64 encoding or uuencoded data.

isencodedhtml()

This is true if the message appears to contain mime or uuencoded HTML instead of plain text data.

isencodedtext()

This is true if the message appears to contain mime or uuencoded text data.  This will always be true if isencodedhtml() returns true.

isencodedurl()

This is true if the message appears to contain a uuencoded URL reference.

isflag("flag-name")

Used to check if a flag variable has been defined as true, this can be done with the setflag("flag-name") action, e.g.

if (size()>100000) setflag("bigitem") if (isimage()) setflag("bigitem") if (isflag("bigitem")>100000) reject "It was a big item or had a picture in it"

ishtml()

This is true if the message appears to contain HTML instead of plain text data.

isimage()

This is true if the message appears to contain a picture (either mime or uuencoded)

isin("header","string-not-case-sensitive")

This is a simple 'content' searching function, if the named header contains the string (a non case sensitive match is used) e.g.

if (isin("Subject","Free")) reject "Probably a spammer selling something"

This would reject a message containing a subject of "Get your Free pictures here", it would also reject a message containing a subject of "Is there any real freedom in the world?" so it's probably not a good rule :-)

lines()

This returns the number of lines in the message.

match("header","wildcard")

This function applies a simple wild card matching algorithym as is typically used to match file names, e.g. match("From","*@netwin.co.nz*") would match against a message from that domain.

matchall("header","wildcardlist")

Used for matching a single wild card, against a header which contains a list of values, like Newsgroups:, Path:, etc..., The match is TRUE only if all entries in the list match, e.g. if (matchall("Newsgroups","news.filters.*")) accept "It is only in the filters list so we will accept it"

matchone("header","wildcardlist")

Identical to the above function, but returns 'TRUE' if any match occurs.

rexp("header","regular-expression")

This function searches the named header for a regular expression, the matching is not case sensitive, use rexp_case() for a case sensitive version.

size()

Returns the size in bytes of the current message, can be used with > and < operators.


Actions

accept "reason"

Accepts the current article reporting the "reason" specified in the log files.

clearflag("flag-name")

Used to set the specified flag variable to the false state.

forward_cc("new@email.address")

Sends the current message to this new email address in addition to any existing desitination users.

reject "reason"

Rejects the current article reporting the "reason" specified in the log files.

replace("header_name","wildcard_match_pattern","replacement_pattern")

If the named header matches the 'wildcard_match_pattern' then the replacement pattern is applied, e.g.

replace("from",,"*@*.domain.name","BOB_$1@$2.other.name")

Subject: "joe@this.domain.name"

Would be translated to:

Subject: "BOB_joe@this.other.name"

report("manger@email.address","subject of message")

Sends an email including the top part of the offending message to the specified person, with the specified subject, this is intended when you want to be alerted to something but don't want to simply forward the message itself which may be 'confusing' as it would look like the message had been sent to the manager directly.

setflag("flag-name")

Used to set the specified flag variable to the true state.

spamdetect(spam_score, "reason")

Adds the value spam_score to the total spam score for the message, and adds reason to the list of reasons. The total spam score and complete list of reasons is added to the message in an X-SpamDetect: header. The client can then check for the header, and the total score, in order to determine the 'spaminess' of the message. Note that this header is only added if at least one call to spamdetect is made.


Regular Expression Syntax Summary

\s = white space
\S = not white space
\d = digit
\D = not digit
\b = word boundary
\B = not word boundary
\x00 = Hex character

. (period) represents any one character.
[] (brackets) contain a set of characters from which a match can be made. It corresponds to one character in the search string.
\ (backslash) is an escape character which means that the next character will not have a special meaning.
* (asterisk) is a multiplier. It will match zero or more ofthe previous character. (Note that it's not a wildcard character as in file names.)
? (question mark) is a multiplier. It will match zero or one of the previous character. (Note that it's not a wildcard character as in file names.)
+ (plus) is a multiplier. It will match one or more of the previous character.
{} (squiggly brackets) contain a number which specifies an exact number of the previous character. Or range {2,3}
[^] (brackets containing caret and other characters) means any characters except the character(s) after the caret symbol
in the brackets.
^ (caret) is the start of the line.
$ (dollar) is the end of the line.
\< represents the start of a word.
\> represents the end of a word.

[:alpha:] represents any alphabetic letter.
[:digit:] represents any single-digit number.
[:blank:] represents a space or tab.

Lookahead operator
Free(?!dom|bsd) matches freesex but not freedom or freebsd

OR operator
| (pipe) is OR. It requires that the joined expressions have parentheses around them.

Examples:

e.a matches eta, eda, e1a, but not Eta
[eE].a matches eta and Eta
E.*a matches Eudora, Etcetera, Ea
ho+p matches hop, hoop, hoooop, but not hp
etc\. matches etc. but not etc


Example Rule File

$sex = "fuck|xxx|sex"
$free = "free(?!dom|bsd|nix|serve)"
$pics = "pi[cx]"
$free_pictures = $free + $pics
$bad_guys = + "|freepictures|jus.?.?\.doi.?.?\.to|great\.site|webbinaries" \
          + "|yad.?.?.?\.ion.?.?\.org|freehidden|joy.?.?\.to.?.?\.al|from.?behind" \
          + "|love(youhon|ergirl|chatting|stofuck)|forever\.yours|\@ju.?.?\.sex|town.\girl|beachbums" \i
# Do some processing which is specific to individual recipients
recipients
	if (isin("recipient","manager@this.domain")) accept "Always accept for me so spammers can talk to me"
	if (isin("recipient","sales@your.domain")) then
		if (isin("subject","order")) then
			# Make a Duplicate of sale order
			call forward_cc("sales_copy@your.domain")
		endif
	endif
end recipients
# add 20 to the spam score if the subject contains "make money"
if (isin("subject","make money")) then
	call spamdetect(20,"possible money-making scam");
endif

# Check for some known spammers and naughty subjects
if (rexp(subject,$free_pictures)) bounce "No emails about free pictures"
if (rexp(from,$bad_guys)) bounce "No emails from black listed people thanks"

# Strip local node names from from addresses:
call replace("From","*@*.parts.co.nz","$1@parts.co.nz")
accept "Great, we liked the message"