CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Looking into the past to better classify web spam Export

In AIRWeb '09: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web (2009), pp. 1-8.

Citation Format

[Posts]

View FullText article


ssn's tags for this article

related spam temporal webir

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

Web spamming techniques aim to achieve undeserved rankings in search results. Research has been widely conducted on identifying such spam and neutralizing its influence. However, existing spam detection work only considers current information. We argue that historical web page information may also be important in spam classification. In this paper, we use content features from historical versions of web pages to improve spam classification. We use supervised learning techniques to combine classifiers based on current page content with classifiers based on temporal features. Experiments on the WEBSPAM-UK2007 dataset show that our approach improves spam classification F-measure performance by 30% compared to a baseline classifier which only considers current page content.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.