CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Parallel crawlers Export

In Proc. of the 11th International World--Wide Web Conference (2002)

Citation Format

[Posts]

View FullText article


egh's tags for this article

distributed web_crawling

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

In this paper we study how we can design an effective parallel crawler. As the size of the Web grows, it becomes imperative to parallelize a crawling process, in order to finish downloading pages in a reasonable amount of time. We first propose multiple architectures for a parallel crawler and identify fundamental issues related to parallel crawling. Based on this understanding, we then propose metrics to evaluate a parallel crawler, and compare the proposed architectures using 40 million pages ...


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.