CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Information genealogy: uncovering the flow of ideas in non-hyperlinked document databases Export

In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (2007), pp. 619-628.

Citation Format

[Posts]

View FullText article


ldietz's tags for this article

citation citationgraph

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

We now have incrementally-grown databases of text documents ranging back for over a decade in areas ranging from personal email, to news-articles and conference proceedings. While accessing individual documents is easy, methods for overviewing and understanding these collections as a whole are lacking in number and in scope. In this paper, we address one such global analysis task, namely the problem of automatically uncovering how ideas spread through the collection over time. We refer to this problem as Information Genealogy . In contrast to bibliometric methods that are limited to collections with explicit citation structure, we investigate content-based methods requiring only the text and timestamps of the documents. In particular, we propose a language-modeling approach and a likelihood ratio test to detect influence between documents in a statistically well-founded way. Furthermore, we show how this method can be used to infer citation graphs and to identify the most influential documents in the collection. Experiments on the NIPS conference proceedings and the Physics ArXiv show that our method is more effective than methods based on document similarity.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.