![]() |
CiteULike | ![]() |
zzb3886's CiteULike | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |
Language-independent text learning with statistical n-gram language modelsby: Fuchun Peng
|
Reviews
[Write a review of this article]
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
Posting History
AbstractIn this thesis, we attempt to build language independent text learning systems that do not require significant human intervention. Our solution is based on statistical n-gram language modeling and unsupervised machine learning. Statistical language modeling is concerned with estimating the probability of word sequences, which provides a natural and principled approach to text learning. Statistical n -gram language models model text as a sequence of characters or words and offer the advantage of language independence. Unsupervised machine learning offers the advantage of significantly reducing human labor. We focus on improving performance on three text learning problems by building statistical n -gram language models and by exploiting the value of un-labeled data. These tasks include language and task independent text classification , language independent lexical learning and unsupervised word segmentation , and Chinese text retrieval .
BibTeX record
RIS record