Pattern-Escape Spam

Tobias Oetiker
OP Blog
Published in
3 min readJan 26, 2022

--

Recently a phishing spam sailed right through our heavily fortified SpamAssassin setup. We have enhanced SpamAssassin with many custom rules, and I am pretty proud of how well the setup works. Therefore, I take such break-through spam pretty personally.

My company is running Zimbra for our own email and also for several customers. We have therefore created numerous custom SpamAssassin rules to catch Zimbra-phishers. Nevertheless, this one slipped through.

A bad but attempt at faking an email from the local mail admin.

Analysis of the phishing email shows an interesting mix of incompetence and ingenuity, hinting at the existence of a Zimbra-phishing toolkit of some sort. The headers of the email had been crafted to create the impression that the email came from within the organization. They got it almost right, except for the Message-Id header which read:

Message-Id: <BJM3JOL0–7OLY-22KE-7NJQ-O4U4FCMF7BG8@[Company]>

Also, the From header looked rather sloppily done, what sort of sender should synchronization@xy.com be?

What caused the email to sail right through the spam filters was the extensive use of BOM markers inside the email body. Let me explain: Instead of writing

Zimbra

they wrote

&#65279;&#65279;Z&#65279;&#65279;i&#65279;&#65279;m&#65279;&#65279;b&#65279;&#65279;r&#65279;&#65279;a

The character combination &#65279; is a so-called HTML entity. It translates into the Unicode character at position 65'279 (or &#xFEFF; in hex). This character is also known as the “Byte Order Mark” or BOM for short. The BOM often appears as the first character in utf-8 encoded text files, to point out that these files are actually encoded in utf-8. But more importantly, this Unicode character is also a “zero width no-break space” — visually indistinguishable from “nothing”.

This is especially problematic for regular expressions, which do all the heavy lifting in SpamAssassin. They see these BOM markers as regular characters while they are invisible on screen.

If a spam message contains Z<BOM>i<BOM>b<BOM>r<BOM>a we will see Zimbra on screen and might ask SpamAssassin to look for this word. But SpamAssassin will not see anything as its regular expression engine does not know that the <BOM> character renders to “nothing” on screen.

After figuring out why the message was slipping through, I was determined to come up with a new SpamAssassin rule that would catch this (ab-)use of <BOM> as an indicator for the message to be spam.

I did a little investigation and found that there is a second Unicode character that could be of interest for Pattern-Escape-Spam: U+2060 the “word joiner”. There are several ways to embed such characters into email messages.

First, there are two ways of writing them as HTML entities&#<decimal>; and &#x<hex>; as shown above.

Second, the email could also be encoded in base64 or quoted printable. Which would allow adding these Unicode characters directly to the email without resorting to using HTML entities.

name  html-dec   html-hex   quoted-printable  perl-regexp
---------------------------------------------------------
BOM &#65279; &#xfeff; =FE=BB=BF \x{feff}
WJ &#8288; &#x2060; =E2=81=A0 \x{2060}

Translated into a regular expression we get

/(?:&\#(?:65279|xfeff|8288|x2060);|=EF=BB=BF|=E2=81=A0|\x{feff}|\x{2060})/

Since the presence of a single zero-width character is not a cause for concern, the actual rule I deployed is a bit more complex. Our custom SpamAssassin filter looks for cases where six such “zero-width” characters appear interspersed with up to five regular characters. Why six and five, you might ask. This was just my random guess … I wanted to make sure that the rule would not react to the odd appearance of such a character which might be used innocently somewhere in copied and pasted text for example.

describe OP_BOMORGY Fight the good fight against BOM/WJ abuse
rawbody __OP_BOMORGY_RAW /(?:(?:&\#(?:65279|xfeff|8288|x2060);|=EF=BB=BF|=E2=81=A0|\x{feff}|\x{2060}).{0,5}?){6}/i
full __OP_BOMORGY_FUL /(?:(?:&\#(?:65279|xfeff|8288|x2060);|=EF=BB=BF|=E2=81=A0|\x{feff}|\x{2060}).{0,5}?){6}/i
meta OP_BOMORGY (__OP_BOMORGY_RAW || __OP_BOMORGY_FUL)
score OP_BOMORGY 4.0

If you feel adventurous, you can add this snippet to your SpamAssassin local.cf file, and enjoy how this new trick in the arsenal of the spammers gets them into trouble.

Let me know how it works for you.

--

--