A Joint Topical N-Gram Language Model Based on LDA
Abstract
In this paper, we propose a novel joint topical n-gram language model that combines the semantic topic information with local constraints in the training procedure. Instead of training the n-gram language model and topic model independently, we estimate the joint probability of latent semantic topic and n-gram directly. In this procedure Latent Dirichlet allocation (LDA) is employed to compute latent topic distributions for sentence instances. Not only does our model capture the long-range dependencies, it also distinguishes the probability distribution of each n-gram in different topics without leading to the problem of data sparseness. Experiments show that our model can lower the perplexity significantly and it is robust on topic numbers and training data scales.
Related Papers
- → Latent Dirichlet Allocation - An approach for topic discovery(2022)18 cited
- → A Study of Topic Modeling Methods(2018)24 cited
- → Analisis Topik Modelling Terhadap Penggunaan Sosial Media Twitter oleh Pejabat Negara(2021)14 cited
- Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions(2013)
- → Topic Modeling on WhatsApp User Reviews Using Latent Dirichlet Allocation(2022)2 cited