[Bio-matrix] ECML-PKDD Discovery Challenge 2006: Call for Participation

Steffen Bickel bickel at informatik.hu-berlin.de
Fri Mar 3 12:49:19 EST 2006


                         CALL FOR PARTICIPATION

                   ECML-PKDD Discovery Challenge 2006

                   Personalized Spam Filtering and
             Generalization Across Related Learning Tasks



The Discovery Challenge 2006 will be held in conjunction with the
ECML-PKDD Conference.

This year's competition is about personalized spam filtering and
generalization across related learning tasks.  People spend an
increasing amount of time for reading messages and deciding whether
they are spam or non-spam. Some users spend additional time to label
their received spam messages for training local spam filters running
on their desktop machines.  Email service providers want to relieve
users from this burden by installing server-based spam filters.
Training such filters cannot rely on labeled messages from the
individual users, but on publicly available sources, such as newsgroup
messages or emails received through "spam traps" (spam traps are email
addresses published visually invisible for humans but get collected by
the web crawlers of spammers).

This combined source of training data is different from the
distributions of the emails received by individual users.  When
learning spam filters for individual users from this type of data one
needs to cope with a discrepancy between the distributions governing
training and test data and one needs a balance between generalization
and adaptation.  The generalization/adaptation can rely on large
amounts of unlabeled emails in the user's inboxes that are accessible
for server-based spam filters. Utilizing this unlabeled data a spam
filter can be adapted to the properties of specific user's inboxes but
when little unlabeled data for a user are available a generalization
over multiple users is advised.

We provide labeled training data collected from publicly available
sources.  The unlabeled inboxes of several users serve as test data.
The inboxes differ in the distribution of emails.  The goal is to
construct a spam filter for each single user that correctly classifies
its emails as spam or non-spam. A clever way of utilizing the
available sets of unlabeled emails from different users is required.

There will be a Discovery Challenge workshop at ECML-PKDD 2006 in
Berlin, where we will discuss the results, different approaches, and
other issues related to the problem setting.


March 1, 2006: Tasks and datasets available online:
June  7, 2006: Submissions of spam filtering results results due:
June 12, 2006: Notification of winners.
June 26, 2006: Workshop paper submission deadline.
September 18-22, 2006: ECML-PKDD Conference / Discovery Challenge Workshop.

For more info about the Challenge, visit

We are looking forward to an interesting competition and encourage
your participation.

Steffen Bickel
Discovery Challenge Chair

More information about the Biomatrx mailing list