Dirichlet Mixture Allocation for Multiclass Document Collections Modeling
2009pp. 711–715
Citations Over TimeTop 20% of 2009 papers
Abstract
Topic model, latent Dirichlet allocation (LDA), is an effective tool for statistical analysis of large collections of documents. In LDA, each document is modeled as a mixture of topics and the topic proportions are generated from the unimodal Dirichlet distribution prior. When a collection of documents are drawn from multiple classes, this unimodal prior is insufficient for data fitting. To solve this problem, we exploit the multimodal Dirichlet mixture prior, and propose the Dirichlet mixture allocation (DMA). We report experiments on the popular TDT2 Corpus demonstrating that DMA models a collection of documents more precisely than LDA when the documents are obtained from multiple classes.
Related Papers
- → Exploring Symmetrical and Asymmetrical Dirichlet Priors for Latent Dirichlet Allocation(2018)17 cited
- → Dirichlet Mixture Allocation for Multiclass Document Collections Modeling(2009)6 cited
- → Computational social science using topic modeling: Analyzing patients' values using a large hospital survey(2018)4 cited
- → A Few Words on Topic Modeling(2016)3 cited
- → Latent Dirichlet Allocation based multilevel classification(2014)1 cited