LDA is a generative model of text. The probability of a word is dependent on the topic and a topic-to-word emission probability. The topic is dependent on a Dirichlet distribution with certain prior. The prior over topics can be used as a feature of the document.
Compared to pLSA, the LDA has fewer parameters to model the document-to-topic probability, as it is characterized by a Dirichlet prior. Whereas in pLSA, it is a matrix whose size increases as the number of documents increases. Overfitting is obviated in LDA.
Reviewed by
zzb3886
- 2009-04-21 21:04:23