CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Spam filters: bayes vs. chi-squared; letters vs. words Export

In ISICT '03: Proceedings of the 1st international symposium on Information and communication technologies (2003), pp. 291-296.

Citation Format

[Posts]

View FullText article


craigtalbert's tags for this article

bayes machinelearning

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

We compare two statistical methods for identifying spam or junk electronic mail. Spam filters are classifiers which determine whether an email is junk or not. The proliferation of spam email has made electronic filtering vitally important. The magnitude of the problem is discussed. We examine the Naive Bayesian method in relation to the 'Chi by degrees of Freedom' approach, the latter used in the field of authorship identification. Both methods produce very promising results. However, the 'Chi by degrees of Freedom' has the advantage of providing significance measures, which will help to reduce false positives. Statistics based on character-level tokenization proves more effective than word-level.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.