CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Dirichlet mixtures in text modeling Export

No. CS-TR-05-1. (2005)

Citation Format

[Posts]

View FullText article


bsilverthorn's tags for this article

dcm_distribution dirichlet text_modeling

X Reviews [Write a review of this article]

X Notes for this article

bsilverthorn has 1 private note and 0 public notes for this article. If you are bsilverthorn then you can log in to see the private note.

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

Word rates in text vary according to global factors such as genre, topic, author, andexpected readership (Church and Gale 1995). Models that summarize such global factorsin text or at the document level, are called ‘text models.’ A finite mixture of Dirichletdistribution (Dirichlet Mixture or DM for short) was investigated as a new text model.When parameters of a multinomial are drawn from a DM, the compound for discreteoutcomes is a finite mixture of the Dirichlet-multinomial. A Dirichlet multinomial can beregarded as a multivariate version of the Poisson mixture, a reliable univariate model forglobal factors (Church and Gale 1995). In the present paper, the DM and its compoundsare introduced, with parameter estimation methods derived from Minka’s fixed-pointmethods (Minka 2003) and the EM algorithm. The method can estimate a considerablenumber of parameters of a large DM, i.e., a few hundred thousand parameters. Afterdiscussion of the relationships within the DM — probabilistic latent semantic analysis(PLSA) (Hofmann 1999), the mixture of unigrams (Nigam et al. 2000), and latentDirichlet allocation (LDA) (Blei et al. 2001, 2003) —the products of statistical languagemodeling applications are discussed and their performance in perplexity compared. TheDM model achieves the lowest perplexity level despite its unitopic nature.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.