CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Unknown Word Extraction for Chinese Documents Export

Proceedings of Coling 2002 (2002), pp. 169-175.

Citation Format

[Posts]

View FullText article


Phanix's tags for this article

chinese detection extraction sinica word

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

There is no blank to mark word boundaries in Chinese text. As a result, identifying words is difficult, because of segmentation ambiguities and occurrences of unknow words. Most previous works focus their attention only on the resolution of ambiguous segmentation. The problem of unknown word identification is considered more difficult and needs further investigation. Convertionally unknown words were extracted by statistical methods for statistical methods are simple and efficient. Howevere the statistical methods without using linguistic knowledge suffer the drawbacks of low precision and low recall. Because character strings with statistical significance might be phrases or partical phrases instead of words and low frequency new words are hardly identifiable by statistic methods. In addition to statistical information, we try to use as much information as possible, such as morphology, syntax, semantics, and world knowledge. The identification system fully utilizes the context and content information of unknown word in the steps of detection process, extraction process, and verification process. A practical unknown word extraction system was implemented wich online identifies new words, including low frequency new words, with high precision and high recall rates.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.