

How to effectively stop spam and junk mail
Identifying the most common spam characteristics
by Mike Spykerman, CEO Red Earth Software
This article discusses how spam messages can be distinguished from legitimate messages by looking at email headers and message content. It also mentions how spam can be blocked effectively by taking these typical spam characteristics into account.
=> Read about the latest top #10 spam characteristics in Red Earth Software's blog.
Spam is not only offensive and annoying; it causes loss of productivity, decreases bandwidth and costs companies a lot of money. Therefore, every smart company that uses email must take measures in order to block spam from entering their email systems. Although it might not be possible to block out all spam, just blocking a large proportion of it will greatly reduce its harmful effects.
In order to effectively filter out spam and junk mail, we need to be able to distinguish spam from legitimate messages. To do this we need to identify typical spam characteristics & practices. Once these practices are known, suitable measures can be put into place to block these messages. Of course, spammers are continually improving their spam tactics, so it is important to keep up to date on new spam practices from time to time to ensure spam is still being blocked effectively.
Spam characteristics appear in two parts of a message; email headers and message content:
1. Email headers
Email headers show the route an email has taken in order to arrive at its
destination. They also contain other information about the email, such as the
sender and recipient, the message ID, date and time of transmission, subject
and several other email characteristics. Most spammers try to hide their
identity by forging email headers or by relaying mail to hide the real source
of the message. Since they need to send mails to a large number of recipients,
spammers use certain methods for mass mailing that can be classified as pure
spam practices and can therefore be identified in the email headers. Although
newsletters and legitimate mailings are also sent to a large number of
recipients, these will generally not display the same characteristics since the
message source does not need to be concealed.
Headers can also be used to trace back the origin of the spam message. However, in this article we are mainly focusing on how to distinguish a spam message from a legitimate message by looking at the email headers, rather than actually tracing the sender of the spam message.
Typical email header characteristics in spam messages:
Recipient's email address is not
in the To: or Cc: fields: The reason for this is that the recipient's
email address is hidden in the Bcc: field or X-Receiver field, along with a
substantial number of other email addresses. Spammers do this in order to
conceal the fact that the mail was sent to a large number of recipients, and
presumably so as not to publish their email list. Some persons might add
recipients to the Bcc: field for sending out 'legitimate' mailings, but these
will tend to be of a more personal nature (which you might wish to block
anyway) since most professional companies do not use this method for sending
newsletters or mailings. Note however that if you do block emails without a
local recipient in the To: or Cc: field, you will be blocking all bcc:
messages.
Empty To: field: This is
also typical for spam messages. Because spammers send out bulk emails by
entering all recipients in the Bcc: field or X-Receiver header, the To: field
is often empty. According to
RFC 822 (Paragraph A.3.1), the worldwide standard for the format of
email messages, every message is required to have at least one email address in
the To: field. Therefore, if this field is empty, this must indicate 'shady
practices'.
To: field contains invalid email
address: Instead of being empty or containing someone else's email
address, the To: field can also contain a bogus email address, e.g. one without
an @ sign or a nonexistent one.
Missing To: field: Emails
that have no To: field at all, can quite definitely be considered as spam since
this can only happen if done on purpose for spamming reasons.
From: field is the same as the
To: field: This is another common practice. Instead of entering a bogus
or empty To: field, the email address in the From: field is also used in the
To: field. Both email addresses are most probably fake email addresses.
Missing From: field:
Again the reasoning behind this is to disguise the actual sender of the
message.
Missing or malformed Message ID:
Since the Message ID includes information about where the message is coming
from, it is often missing or malformed (i.e. no @ sign or an empty string) in
spam messages. The Message ID is in the form of xxx@domain.com. The first part
can be anything and the second part is the name of the machine that assigned
the ID. Although Message ID's are not strictly required, one can safely assume
that they would only be missing or malformed if done deliberately to disguise
the source of the message.
More than 10 recipients in To:
and/or Cc: fields: Many spam messages contain more than 10 recipients
in the To: and/or Cc: fields. Although this can also occur for 'legitimate
mailings', these will tend to be of a personal nature (which you might wish to
block anyway) since most professional companies do not use this method for
sending newsletters or mailings.
Bcc: header exists: In
normal email messages, a Bcc: header does not exist since this is stripped from
the mail.
X-Mailer field contains name of
popular spam ware: The X-mailer field includes the name of the mailing
software that was used to send the mail. If this header contains the name of
popular spam software this could indicate that it is a spam message. However,
many spam mails do not contain an X-Mailer header, or contain mail software
that is widely used such as Microsoft Outlook or Eudora. Since you might also
be blocking legitimate mails if you do not filter on the right names, this
header is probably not worth filtering on.
X-Distribution = bulk:
Spammers using Pegasus mail will have the X-header 'X-Distribution: bulk' added
to their mail if it is addressed to a large number of recipients. This header
occurs quite rarely, so you will not be able to catch large amounts of spam by
filtering on this header. Moreover, many newsletters also contain this header.
X-UIDL header exists:
Incoming messages should not have an X-UIDL header since they are only intended
for the mail server to stop it downloading messages more than once, for
instance when 'leave messages on server' is checked. This header would normally
be stripped when the message is received. Spammers add an X-UIDL header to try
to get the recipient's mail server to download multiple copies of their message
and therefore increase the chance that the message will be read.
Code and space sequence exists:
Many spam mails include a certain code for identification in the subject of the
message. To hide the code from the recipient, a large number of spaces are
usually placed before the code. This is done so that the recipient won't notice
the code or that it is not displayed in the mail client before opening the
message.
Illegal HTML exists: Some
spam messages include a code for identification in the text of the message. The
text is entered outside the HTML tags so as to hide the code from the
recipient. There is no reason to add text outside HTML tags, so the mere
presence of illegal HTML can be treated as suspicious.
Comment tags to avoid detection
by email filters: Some spammers try to circumvent content filters by
placing lots of HTML comment tags within the email body text. In this way,
content filters will not recognize the spam words since they are separated by
comment tags. The recipient however, will not see the comment tags since these
are not displayed when viewing the message in HTML. Therefore it is important
to use an email filter that
can filter emails by removing HTML tags first.
HTML message without plain text
body part: HTML messages usually include a plain text version of the
email so that recipients with email clients that cannot read HTML can still
view the message in plain text. However, many spammers tend to send HTML
messages without this plain text body part, not only to save on size but also
to force recipients to read the HTML version. This enables spammers to embed
links and unique IDs in the HTML code. For instance, many spammers include an
image link that connects to a site when the message is opened. Since each
message contains a unique ID, the spammer will know exactly which recipient has
viewed the mail. In this way, spammers know how many people have viewed their
message and which email addresses are still 'live'. When spammers know that
your email address is 'live' this will entice them to send you even more spam,
so it is important to put a stop to these kinds of spam messages by using a
spam filter that is capable of checking this. Newsletters also tend to
send messages without a plain text body part, so it is important to use a white
list of allowed newsletters so as not to catch any false positives.
2. Message contents
Apart from headers, spammers tend to use certain language in their emails that
companies can use to distinguish spam messages from others. Typical words are
free, limited offer, click here, act now, risk free, lose weight, earn money,
get rich, and (over) use of exclamation marks and capitals in the text. Spam
can be blocked by checking for words in the email body and subject, but it is
important that you filter words accurately since otherwise you might be
blocking legitimate mails as well.
How to stop spam
Now that we know the typical spam characteristics, how can we use these to stop
spam?
Firstly, a mail filtering mechanism must be put in place to block out most of the spam and hoaxes coming into your organization. The email filtering system must be able to analyze email characteristics, classify a mail as spam, and either delete it, flag it (for instance add the word 'SPAM' to the subject), or quarantine it. Preferably, you will be able to make multiple filters that decrease in certainty whether a mail is spam. The more certain the filter is, the more drastic the action, for instance deletion of the message. If the filter can only indicate the possibility of a spam message, you could flag the mail or quarantine it. In order to avoid false positives, the email filtering system should be able to exclude white lists that for instance include allowed newsletters.
The email filtering system should filter out spam messages in three ways (in order of 'spam certainty'):
1. Block spam at the gateway by checking domains in real time black hole lists: There are a number of 'black hole lists' that contain IP addresses and domains from known spammers. By using these lists you can filter out a large amount of spam. Not only will you stop a large proportion of spam messages from reaching your users, it will also save you utilizing your bandwidth to download spam messages since the message is blocked at the gateway, before the mail is even downloaded. There are two types of lists: (a) Lists of known spammer's domains, for example the Spamhaus Block List (SBL), and (b) Lists of mail servers that are open to relaying and therefore will allow spammers to send mail via their mail server. An example of this last kind of list is the Open Relay Database (ORDB). Whilst lists of the first type (spammer's domains) should be fairly accurate, lists of the second type, the open relay lists, can result in more false positives. This is because genuine persons that wish to contact your organization might not be aware that their mail server is being used for relaying. Therefore, it is important to treat each spam list differently. For instance, you could choose not to download all messages from domains listed on the Spamhaus Block List, and quarantine or delete (with the possibility to undelete) mails from the Open Relay Database.
2. Filter out spam based on email header characteristics: Most of the email header characteristics mentioned above can safely be used to classify a mail as spam. Therefore, you could decide to delete messages that contain any or some of the above mentioned spam headers. Since checking email headers is a fast process, it is good to check these before checking the actual email message content.
3. Identify junk mail content: There will still be spam messages that get through both filters mentioned above. The last way to distinguish these mails is by checking for spam message content. Depending on the words you select to filter on, this can usually be very accurate. For instance messages that contain phrases such as CLICK HERE, FREE!!, EARN MONEY, FAST CASH, BUY NOW, $$$, fast bucks and huge savings are almost 100% certain of being spam. Then there are words that could possibly be used in legitimate mails as well, such as money back, accept credit cards, credit profile, cash back, FREE. Therefore it is important to either perform different actions on the different sets of phrases, or to use textual analysis software that can minimize the chance of catching legitimate messages. For instance, by giving words or phrases a certain word score and specifying a word score threshold per email, you are able to specify quite precisely which messages should be blocked and therefore decrease the amount of wrongly blocked messages. It is also important to apply case sensitivity to words, since spammers often use capitals in their messages.
Finally, you will need to educate your users. They must know that spam should be deleted straight away, and that they should never send a reply to a spam mail. This will just confirm that the email address is 'live' and will enable the spammers to sell the email address to other companies for further abuse. If the mail is a hoax, for instance a message about fake viruses, pyramid schemes promising lots of fast earned cash or victims asking for support by forwarding their mail, users should delete the message and not forward these mails. If users are educated in this way, you will be able to limit the negative impact of any spam or hoax message that has been able to pass your filters.
*****
About the author
Mike Spykerman is CEO of Red Earth Software, a software development company that specializes in email policy enforcement software. The company's current products include Policy Patrol, an Exchange server and Lotus Notes add-on for blocking spam, viruses, offensive content, attachment quarantining, adding disclaimers and much more. Red Earth Software are Microsoft Certified Partners.
References
Reading email headers by www.stopspam.org
E-Mail Spamming
countermeasures
Disclaimer
This article is in no way meant to provide spammers with tips on how to send out spam or bypass spam filters. Rather, it is meant to provide information for companies so that they can effectively block spam. By discussing spam characteristics openly, the author recognizes that spammers might be able to use this information in order to avoid email filters. However, in many cases spammers are already aware of the identifiable headers in their messages, whereas many companies trying to block spam are not. Therefore the author considers it

