Sparse Additive Generative Models of Text
Citations Over TimeTop 1% of 2018 papers
Abstract
Generative models of text typically associate a multinomial with every class label or topic. Even in simple models this requires the estimation of thousands of parameters; in multifaceted latent variable models, standard approaches require additional latent ``switching'' variables for every token, complicating inference. In this paper, we propose an alternative generative model for text. The central idea is that each class label or latent topic is endowed with a model of the deviation in log-frequency from a constant background distribution. This approach has two key advantages: we can enforce sparsity to prevent overfitting, and we can combine generative facets through simple addition in log space, avoiding the need for latent switching variables. We demonstrate the applicability of this idea to a range of scenarios: classification, topic modeling, and more complex multifaceted generative models.
Related Papers
- → Latent dirichlet allocation(2003)26,942 cited
- → Finding scientific topics(2004)5,931 cited
- Reading Tea Leaves: How Humans Interpret Topic Models(2009)
- → A Latent Variable Model for Geographic Lexical Variation(2018)607 cited
- → Supervised Topic Models(2010)1,316 cited