Developing a Realistic Email-User Component for AntispamLab ‒ LCA2 ‐ EPFL

We are developing a novel antispam system based on the workings of the human immune system [3,4]. As our system does collaborative filtering, to test it properly and efficiently we need a testing environment that has all next features:

Usage of multiple interconnected email servers and clients,
Included standard/optional e-mail-user behavior
Usage of important spamming techniques.

Moreover, there is currently a strong need within the antispam community for such a testing environment: 1) on one side, many of the existing and newly proposed antispam techniques decide whether the received email is spam based not solely on the email instance itself, but also take as inputs the information from the emailing network about spam bulkiness, users` actions and social relations among the users [5]; 2) on the other side, the existing tools for testing spam filters [6] evaluate a filter instance by simply feeding it with a stream of emails, possibly also providing a feedback to the filter about the correctness of the detection; as the tested filter is disconnected from any emailing network, many collaborative antispam techniques and filters can�t be tested in this way or the results of their evaluation might be inappropriate.

AntispamLab [1,2] is a new and the first tool for testing spam filters which is designed to support all the above listed features. The current implementation fully supports the feature A, but uses very simple models of users and spammers (features B and C). The tool is modular and the email-user model and spammer model components can be further developed without changing other parts of the tool.

Goals of the project: Develop a new email-user model that realistically represents the user�s behavior: users` email contacts, content and timing of sent, replied and forwarded emails. Implement the model (in Python or C) as a new email-user component within the AntispamLab tool.

Required skills: EPFL/other-university courses level of networking knowledge (preferably passed Computer Networking I and TCP/IP or equivalent with good grades). Having passed the course Networks out of control: Models and methods for large scale random networks� given by Prof. M. Grossglauser and Prof. P. Thiran is a plus but not a requirement.

References:
[1] AntispamLab project web site: http://lcawww.epfl.ch/ssarafij/antispamlab
[2] Slavisa Sarafijanovic, Luis Hernandez, Raphael Naefen and Jean-Yves Le Boudec: AntispamLab – A Tool for Realistic Evaluation of Email Spam Filters. Accepted for publication at Fourth Conference on Email and Anti-Spam (CEAS 2007), Mountain View, 2-3 Aug 2007. (a major antispam conference; paper draft available upon request)
[3] MICS networked-software-systems project: collaborative spam filtering based on artificial immune systems approach. Web site: http://www.mics.org/micsCluster.php?groupName=CL3&action=projects#P4
[4] Slavisa Sarafijanovic and Jean-Yves Le Boudec: Method to Filter Electronic Messages in a Message Processing System. US patent No 11/515,063, filed on September 5, 2006 (printed copy available)
[5] Mathias Grossglauser and coauthors: TrustMyMail. Web page: http://www.trustmymail.com/
[6] TREC testing tool: http://plg.uwaterloo.ca/~gvcormac/spam/

Benefits: Chance to become a recognized contributor to a state-of-the-art open source tool that has a good potential to become popular in the antispam community (but also visible in a broader Internet community as spam is generally an interesting topic); work in the exciting filed of email-social-networks modeling and apply your findings to a real and usable software tool.

Domain:

Formal analysis, methods, frameworks; Network performance analysis; Other; Security

Student info:

Vincent Etter