CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Context dependent class language model based on word co-occurrence matrix in LSA framework for speech recognition Export

In ACS'08: Proceedings of the 8th conference on Applied computer scince (2008), pp. 275-280.

Citation Format

[Posts]

View FullText article


zzb3886's tags for this article

class lm lsa

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

We address the issue of data sparseness problem in language model (LM). Using class LM is one way to avoid this problem. In class LM, infrequent words are supported by more frequent words in the same class. This paper investigates a class LM based on LSA. A word-document matrix is usually used to represent a corpus in LSA framework. However, this matrix ignores word order in the sentence. We propose several word co-occurrence matrices that keep word order. Together with these matrices, we define a context dependent class (CDC) LM which distinguishes classes according to their context in the sentences. Experiments on Wall Street Journal (WSJ) corpus show that the word co-occurrence matrix works better than word-document matrix. Furthermore, the CDC achieves better perplexity than the traditional class LM based on LSA.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.