![]() |
CiteULike | ![]() |
pprett's CiteULike | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |
Bilingual topic aspect classification with a few training examplesby: Yejun Wu, Douglas W. Oard
In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (2008), pp. 203-210.
|
Reviews
[Write a review of this article]
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
Posting History
AbstractThis paper explores topic aspect (i.e., subtopic or facet) classification for English and Chinese collections. The evaluation model assumes a bilingual user who has found documents on a topic and identified a few passages in each language on aspects of that topic. Additional passages are then automatically labeled using a k-Nearest-Neighbor classifier and local (i.e., result set) Latent Semantic Analysis. Experiments show that when few training examples are available in either language, classification using training examples from both languages can often achieve higher effectiveness than using training examples from just one language. When the total number of training examples is held constant, classification effectiveness correlates positively with the fraction of same-language training examples in the training set. These results suggest that supervised classification can benefit from hand-annotating a few same-language examples, and that when performing classification in bilingual collections it is useful to label some examples in each language.
BibTeX record
RIS record