Parasite: Mining structural information on the web
Web information retrieval tools typically make use of only the text on pages, ignoring valuable information implicitly contained in links. At the other extreme, viewing the Web as a traditional hypertext system would also be mistake, because heterogeneity, cross-domain links, and the dynamic nature of the Web mean that many assumptions of typical hypertext systems do not apply. The novelty of the Web leads to new problems in information access, and it is necessary to make use of the new kinds of information available, such as multiple independent categorization, naming, and indexing of pages. This paper discusses the varieties of link information (not just hyperlinks) on the Web, how the Web differs from conventional hypertext, and how the links can be exploited to build useful applications. Specific applications presented as part of the ParaSite system find individuals ' homepages, new locations of moved pages, and unindexed information.