On the peninsula phenomenon in web graph and its implications on web searchComputer Networks, Vol. 51, No. 1. (January 2007), pp. 177-189.
|
Reviews
[Write a review of this article]
There are no reviews of this article
Notes for this article
Peninsula = set of pages that have a single entry-point or tache. Those pages are only reachable from the tache.
Peninsula sizes follow a power law (surprise!)
Provides an exact and an approximate algorithm for finding peninsulas
About 9% of pages are taches of some peninsula, 4% of a peninsula larger than 10 nodes.
2% of loss in link extraction results in losing roughly 50% of the coverage.
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
AbstractWeb masters usually place certain web pages such as home pages and index pages in front of others. Under such a design, it is necessary to go through some pages to reach the destination pages, which is similar to the scenario of reaching an inner town of a peninsula through other towns at the edge of the peninsula. In this paper, we try to validate that peninsulas are a universal phenomenon in the World-Wide Web, and clarify how this phenomenon can be used to enhance web search and study web connectivity problems. For this purpose, we model the web as a directed graph, and give a proper definition of peninsulas based on this graph. We also present an efficient algorithm to find web peninsulas. Using data collected from the Chinese web by Tianwang search engine, we perform an experiment on the distribution of sizes of peninsulas and their correlations with PageRank values, outdegrees, or indegrees of the ties with other outside vertices. The results show that the peninsula structure on a web graph can greatly expedite the computation of PageRank values; and it can also significantly affect the link extraction capability and information coverage of web crawlers.
BibTeX record
RIS record