CiteULike is a free online bibliography manager. Register and you can start organising your references online.

On single-pass indexing with MapReduce Export

In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (2009), pp. 742-743.

Citation Format

[Posts]

View FullText article


X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

Indexing is an important Information Retrieval (IR) operation, which must be parallelised to support large-scale document corpora. We propose a novel adaptation of the state-of-the-art single-pass indexing algorithm in terms of the MapReduce programming model. We then experiment with this adaptation, in the context of the Hadoop MapReduce implementation. In particular, we explore the scale of improvements that can be achieved when using firstly more processing hardware and secondly larger corpora. Our results show that indexing speed increases in a close to linear fashion when scaling corpus size or number of processing machines. This suggests that the proposed indexing implementation is viable to support upcoming large-scale corpora.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.