CiteULike is a free online bibliography manager. Register and you can start organising your references online.
Tags

An Adaptive, Semi-Structured Language Model Approach to Spam Filtering on a New Corpus

by: B. Medlock
In CEAS 2006 - Third Conference on Email and Anti-Spam (27 July 2006)

Citation Format


View FullText article


Abstract

Motivated by current efforts to construct more realistic spam filtering experimental corpora, we present a newly assembled, publicly available corpus of genuine and unsolicited (spam) email, dubbed <i>GenSpam</i>. We also propose an adaptive model for semi-structured document classification based on language model component interpolation. We compare this with a number of alternative classification models, and report promising results on the spam filtering task using a specifically assembled test set to be released as part of the <i>GenSpam</i> corpus.


baaic's tags for this article


X There are no reviews yet

X Find related articles with these CiteULike tags

X Posting History


X Export records

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.