Developing a Realistic Spammer Component for AntispamLab ‒ LCA2 ‐ EPFL

We are developing a novel antispam system based on the workings of the human immune system [3,4]. As our system does collaborative filtering, to test it properly and efficiently we need a testing environment that has all next features:

Usage of multiple interconnected email servers and clients,
Included standard/optional e-mail-user behavior,
Usage of important spamming techniques.

Moreover, there is currently a strong need within the antispam community for such a testing environment: 1) on one side, many of the existing and newly proposed antispam techniques decide whether the received email is spam based not solely on the email instance itself, but also take as inputs the information from the emailing network about spam bulkiness, users` actions and social relations among the users [5]; 2) on the other side, the existing tools for testing spam filters [6] evaluate a filter instance by simply feeding it with a stream of emails, possibly also providing a feedback to the filter about the correctness of the detection; as the tested filter is disconnected from any emailing network, many collaborative antispam techniques and filters can�t be tested in this way or the results of their evaluation might be inappropriate.

AntispamLab [1,2] is a new and the first tool for testing spam filters which is designed to support all the above listed features. The current implementation fully supports the feature A, but uses very simple models of users and spammers (features B and C). The tool is modular and the email-user model and spammer model components can be further developed without changing other parts of the tool.

Goals of the project: Learn state of the art of spamming and antispam techniques. Develop a new spammer model that includes main known spamming techniques, and implement it (in Python or C) as a new spammer component within the AntispamLab tool.

Required skills: EPFL/other-university courses level of networking knowledge (preferably passed Computer Networking I and TCP/IP or equivalent with good grades).

References:
[1] AntispamLab project web site: http://lcawww.epfl.ch/ssarafij/antispamlab
[2] Slavisa Sarafijanovic, Luis Hernandez, Raphael Naefen and Jean-Yves Le Boudec: AntispamLab – A Tool for Realistic Evaluation of Email Spam Filters. Accepted for publication at Fourth Conference on Email and Anti-Spam (CEAS 2007), Mountain View, 2-3 Aug 2007. (a major antispam conference; paper draft available upon request)
[3] MICS networked-software-systems project: collaborative spam filtering based on artificial immune systems approach. Web site: http://www.mics.org/micsCluster.php?groupName=CL3&action=projects#P4
[4] Slavisa Sarafijanovic and Jean-Yves Le Boudec: Method to Filter Electronic Messages in a Message Processing System. US patent No 11/515,063, filed on September 5, 2006 (printed copy available)
[5] Mathias Grossglauser and coauthors: TrustMyMail. Web page: http://www.trustmymail.com/
[6] TREC testing tool: http://plg.uwaterloo.ca/~gvcormac/spam/

Benefits: Chance to become a recognized contributor to a state-of-the-art open source tool that has a good potential to become popular in the antispam community (but also visible in a broader Internet community as spam is generally an interesting topic).

Domain: Formal analysis, methods, frameworks; Network performance analysis; Other; Security

Student info: Sabrina Perez