![]() |
CiteULike | ![]() |
pool007's CiteULike | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |
Crawling a country: better strategies than breadth-first for web page orderingIn WWW '05: Special interest tracks and posters of the 14th international conference on World Wide Web (2005), pp. 864-872.
|
Reviews
[Write a review of this article]
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
Posting History
AbstractThis article compares several page ordering strategies for Web crawling under several metrics. The objective of these strategies is to download the most "important" pages "early" during the crawl. As the coverage of modern search engines is small compared to the size of the Web, and it is impossible to index all of the Web for both theoretical and practical reasons, it is relevant to index at least the most important pages.We use data from actual Web pages to build Web graphs and execute a crawler simulator on those graphs. As the Web is very dynamic, crawling simulation is the only way to ensure that all the strategies considered are compared under the same conditions. We propose several page ordering strategies that are more efficient than breadth- first search and strategies based on partial Pagerank calculations.
BibTeX record
RIS record