Spam Filtering: August 2004 Archives
Came across a very interesting entry on Justin Mason's blog:
Open Source v Closed Source spam filtering
which explains how spammers test closed-source email filters. I suppose it makes sense, but it's still quite scary that theyse people invest so much time and effort in circumventing company's best efforts to protect their clients' inboxes.
Ross pasted a link to me this morning which had me in stitches:
Spammers Sending Messages from the Future
Just reading a few of the lists this morning and noticed the usual problems with using an RBL to block mail at the MTA level(name removed to protect the original poster's identity):
"But the problem is, some of my users also are unable to send their emails using SMTP server as their "dynamic" IP is banned because some of the ips are listed in spamhaus. They keep getting the error above. How can I rectify this? Is there a command for me to add to allow user based on their IP address or email address?
"
Solution available: none if you insist on using spamhaus to block mail at the MTA level.
Denying access to your MTA based on RBLs is demented and wrong. Why? Because you cannot rely 100% on an RBL's accuracy.
Does this mean that RBLs are inaccurate?
No, of course not. You just need to understand how they work and how to use them.
If you score against an RBL you will get the right results, as you will score based on a number of criteria ie. there isn't a "single point of failure"
The root of the problem does not lie with the RBL maintainers, some of them even state on their respective sites that blocking is a bad idea, but with misinformed sysadmins.
If you are running a mail server for personal use you can do pretty much what you like, as you are the only person who is going to suffer if/when things go wrong. However if you start implementing blocking in a business environment you are simply asking for trouble. Of course you are going to see a noticeable reduction in spam, simply because you'll have blocked a large portion of the internet.
Spamhaus is a fantastic resource and can help to significantly reduce the amount of spam arriving in your users' mailboxes, but it is not a good idea to block all mail emanating from IP ranges listed by it.
Some discussion recently on the SURBL list has centred around the length of time an IP is listed in Spamhaus. Although it makes interesting reading from a theoretical point of view, its practical implications are not going to bring any significant change to usage. The idea that an IP may be listed for a brief period and then delisted as the issue is addressed is not unique to Spamhaus. In reality the only thing that matters is whether the IP is listed at the time of arrival on your scanning server ie. whether it will be flagged or not.
A couple of people were asking me where they could find rpms for SA 2.64, so here's a link to help you:
DAG rpm archive
Personally I prefer to do it from cpan or source, as the rpms have a "charming" tendency of installing all sorts of things that I really do not want.
Background
We (Blacknight Solutions) have been offering email filtering to our clients since early 2002. We first began "experimenting" with spam filtering as we saw that the problem of spam/uce was growing exponentially and neither we nor our clients wanted to have our inboxes taken over by rubbish.
For the first 10-12 months after implementing server-side filtering we did not block email, as we preferred to merely tag it and deliver it. By tagging the subject line of emails in a consistent manner our clients were able to filter potential spam into another "folder" for examination.
After our initial tagging period, which involved constant tweaking of the scoring criteria, we moved from tagging to storing.
Currently we offer email filtering at different levels to our clients. At the lower end of the scale the clients' email is scanned and stored by us without any user intervention ie. no customised black/white listing etc., while at the higher end customisable rules and criteria are implemented.
Scope and motivation of this article
Over the past 6 to 12 months the subject of email filtering has begun to attract more publicity both in "techie" circles and amongst the general public. One of the reasons for writing this article is to address some of the common misconceptions about email filtering and best practices. After following many of the discussions on technical mailing lists and bulletin boards over the last few months the author feels strongly that some people's approach to email filtering is both misinformed and dangerous.
Due to the scope of the subject matter this article will probably be split into a number of shorter articles ie. parts, but comments from readers will be welcomed.
This article will address some of the issues involved in implementing email filtering for business and discuss some of the methods currently being used both in industry in general and by the author.
Due to the nature of our service the finer details of our setup will not be revealed, but general criteria and methodology will be discussed.
Any opinions expressed in this article are the author's and are based on the author's experiences.
Definitions
In order to avoid confusion a number of terms should be defined for the purposes of this article.
UCE: unsolicited commercial email
For many people there is no clear difference between the two. However a number of things may give some indication. If the sender of the email makes it clear where they obtained the email address and how you may be removed from the list it is helpful, although there is a very valid argument about unsubscribing from lists to which one was never subscribed. Why should the onus be on the recipient? It also informs the sender that the email address is valid. In my case I can usually tell if an email address has been scraped or not based purely on the address. A number of my older email aliases have not been used for at least two years due to the volume of spam that they were receiving. As a result I can safely say that any mail received to info@ is spam, as the address has not appeared on our website for at least two years, nor have I used it for at least that period. This is not a matter of a spam trap but more a simple case of applied logic. The only way you could get that address is through a spammers' database.
spam: If you look at the variety of definitions offered by Google for this term you should immediately see part of the problem. Depending on who you talk to scope of the definition can change quite dramatically. In simplest terms it may be best to refer to "spam" as unwanted commercial email ie. mail being sent on bulk offering you commercial services that you do not want. Even that definition is not very clear, but it may help as a starting point. The type of spam that causes most problems for business is adult in nature and may vary from the extreme hardcore porn variety through to the adverts for sexual aids both herbal, chemical and physical.
Tools
There are an ever increasing number of tools and services available to help you block spam/uce available on the market. These can be divided into two groups:
client-side: The software resides on the user's pc. It may be an independent piece of software or an addon to an email client. For example email clients such as Outlook 2003 and Eudora include spam filtering tools. Although client-side tools have their merits they do not address the primary issue with spam, which is the cost in both time and resources in downloading unwanted email. For this reason I believe that we should focus on server-side solutions. Another issue with client-side applications is that they do not update often enough, so they cannot address the issues that each new wave of spam brings.
Server-side: As the name suggests these are tools that work directly on the mail server. The advantages to using server-side tools are numerous. By blocking/filtering mail on the server you move the administrative responsibility away from the user to the server admin and their choice of tools. ISPs and hosting companies' mail servers are connected to the 'net 24/7 via high bandwidth connections, so although the level of unwanted email may incur a varying level of resource usage at the server level this will have significantly less impact than the resource usage at the client level.
Unlike client-side tools those used server-side have the ability to update not only in realtime but also through collaboration with other servers and through the usage patterns of the users being served.
Common Problems and misconceptions
There are a number of problems facing any provider of email filtering.
- Technology
- Client expectations
- Accuracy
- Contractual issues

