CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Authoritative sources in a hyperlinked environment

J. ACM, Vol. 46, No. 5. (1999), pp. 604-632.

X Abstract

The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of context on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of “authorative” information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristrics for link-based analysis.

View the full article here:

ACM, DOI

This article has been bookmarked 24 times, initially on 2004-11-08.

2009-11-03 User inbeom
2009-08-25 User cn_honjyo
2009-03-04 User senseable-urb
2009-02-28 User mpotamias
2008-11-04 User vlee
2008-10-28 User AbnerCYH
2008-10-05 User Passerby
2008-09-09 User macle
2008-07-14 User bemike
2007-07-16 User merazzle
2007-07-12 User dave_d
2007-03-23 User michaelmampaey
Group ADMiRes
2007-01-10 User gcalda
2006-11-06 User atbrew
2006-10-02 User fwkroon
2006-05-14 User anon_pl
2006-03-16 User mapio
2005-10-30 User korakot
Group Philosophy_of_Information
Group Blog_and_Wiki_Research
2005-10-25 User ssn
2004-11-08 User camster , 1 note

Kleinberg talks about how to extract "authorities" and "hubs" from a bunch of interlinked web pages. Consider, for example, the set of web pages returned by a search engine for a particular query. We're interested in finding the one original and authoritative source for our search term. Google's PageRank is quite good at doing precisely this job (and, perhaps, PigeonRank (http://www.google.com/technology/pigeonrank.html) is even better?). Search for "apple computer" and you'd expect to get www.apple.com back as the first hit, rather than some other page containing the words apple computer. In Kleinberg's teminology, we're seeking an authoritative page.

The dual problem (dual means "reverse all the links between the pages" and then look for the authorities in this upside-down world) is to find a set of hubs. These are not exactly the hubs that Bararbási talks about, but rather the Yahoos and the endless lists of "my favourite links" on personal home pages. The symmetry is that hubs link to lots of pages (including authorities), whereas authorities are liked from a lot of pages (including hubs).

We'll probably see something similar on CiteULike. The hubs will be the real citation classics that everyone's reading in a particular field. The hubs should hopefully be the people who read widely and in different fields and can produce a good overview of how to "glue" a couple of related subjects together. Both are probably worth mining from our dataset.

2004-11-17 21:07:50
Group dbk-lab
Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.