Please help support CiteULike by taking part in our marketing survey.
CiteULike is a free online bibliography manager. Register and you can start organising your references online.

The anatomy of a large-scale hypertextual Web search engine

Computer Networks and ISDN Systems, Vol. 30, No. 1-7. (April 1998), pp. 107-117.

X Abstract

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/ To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of Web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the Web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and Web proliferation, creating a Web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale Web search engine — the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

View the full article here:

ACM, DOI, ElsevierPII

This article has been bookmarked 24 times, initially on 2004-11-08.

2009-11-08 User anbraendle
2008-11-24 User fgeller
2008-04-30 User dvallet
Group NETS
Group NETS-UAM
2007-10-07 User vmircevski
2007-07-16 User merazzle
2007-05-07 User bfraser
2007-03-23 User michaelmampaey
Group ADMiRes
2007-02-10 User carlblesius
Group Blog_and_Wiki_Research
Group mgh-lcs
Group hst-bmi
2006-11-26 User alad
2006-04-30 User bragadocchio
2006-04-04 User ssn
2006-03-16 User mapio
2005-12-14 User pdlug
2005-10-27 User stagediver
2005-08-14 User nkorf
Group Semantic-Social-Networks
2004-11-08 User camster
Group dbk-lab
Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.