Register | Log in | FAQ      [?] 
CiteULike is a free online bibliography manager. Register and you can start organising your references online.
Recent | Unread | Search | Authors | Tags | Export

On the peninsula phenomenon in web graph and its implications on web search

by: Tao Meng, Hong-Fei Yan
Computer Networks, Vol. 51, No. 1. (January 2007), pp. 177-189.


View FullText article


X Reviews [Write a review of this article]

There are no reviews of this article

X Notes for this article

ChaTo has 0 private notes and 1 public note for this article.

Peninsula = set of pages that have a single entry-point or tache. Those pages are only reachable from the tache.

Peninsula sizes follow a power law (surprise!)

Provides an exact and an approximate algorithm for finding peninsulas

About 9% of pages are taches of some peninsula, 4% of a peninsula larger than 10 nodes.

2% of loss in link extraction results in losing roughly 50% of the coverage.

ChaTo (public note) - 2008-04-24 01:45:10

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Abstract

Web masters usually place certain web pages such as home pages and index pages in front of others. Under such a design, it is necessary to go through some pages to reach the destination pages, which is similar to the scenario of reaching an inner town of a peninsula through other towns at the edge of the peninsula. In this paper, we try to validate that peninsulas are a universal phenomenon in the World-Wide Web, and clarify how this phenomenon can be used to enhance web search and study web connectivity problems. For this purpose, we model the web as a directed graph, and give a proper definition of peninsulas based on this graph. We also present an efficient algorithm to find web peninsulas. Using data collected from the Chinese web by Tianwang search engine, we perform an experiment on the distribution of sizes of peninsulas and their correlations with PageRank values, outdegrees, or indegrees of the ties with other outside vertices. The results show that the peninsula structure on a web graph can greatly expedite the computation of PageRank values; and it can also significantly affect the link extraction capability and information coverage of web crawlers.


X BibTeX record

X RIS record



RIS BibTeX
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.