CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Automatic Language-Independent Induction of Gazetteer Lists Export

In Proceedings of 4th Language Resources and Evaluation Conference (LREC?04 (2004), pp. 2004-2004.

Citation Format

[Posts]

View FullText article


lisah2u's tags for this article

arabic bootstrapping chinese gazetteer hindi information-extraction

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

Adaptation of existing Information Extraction (IE) systems to new languages and domains is the focus of much current research, but progress is often hindered by the lack of available resources to enable developers to get a new system up and running fast. It has previously been shown that a good set of gazetteer lists can have a vital role here, but creation of lists for a new language or domain can be time-consuming and laborious. In this paper we demonstrate a tool for inducing gazetteer lists from a small set of annotated corpora and creating a baseline IE system. We also describe an extension to this, using bootstrapping techniques in order to generate much larger volumes of noisy training texts. High quality results have been achieved in this way on Hindi, Chinese and Arabic.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.