CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Grid-based indexing of a newswire corpus Export

Grid Computing, 2004. Proceedings. Fifth IEEE/ACM International Workshop on In Grid Computing, 2004. Proceedings. Fifth IEEE/ACM International Workshop on (2004), pp. 320-327.

Citation Format

[Posts]

View FullText article


X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

In this paper we report experience in the use of computational grids in the domain of natural language processing, particularly in the area of information extraction, to create query indices for information retrieval tasks. Given the prevalence of large corpora in the natural language processing domain, computational grids offer significant utility to researchers in the domain who are reaching the bounds of computational efficiency. We leverage the affinities between the segmented data sources prevalent in natural language processing and the parallelisation model from the grid domain. The experiment reported here is a large-scale newswire corpus indexing task, with the goal to efficiently create a queryable index of the entire corpus. By parallelising the indexing task and executing it on an Australian computational grid, we observe overall performance improvement of a 2.26x speedup over the same experiment on a single computational node. In addition to reporting the raw performance impact, we reflect on a number of interesting points discovered during the execution of the experiments and propose a number of new requirements for grid middleware.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.