0 citations0 references

A Method of Accounting Bigrams in Topic Models

2015pp. 1–9

Citations Over TimeTop 10% of 2015 papers

Abstract

The paper describes the results of an empirical study of integrating bigram collocations and similarities between them and unigrams into topic models. First of all, we propose a novel algorithm PLSA-SIM that is a modification of the original algorithm PLSA. It incorporates bigrams and maintains relationships between unigrams and bigrams based on their component structure. Then we analyze a variety of word association measures in order to integrate top-ranked bigrams into topic models. All experiments were conducted on four text collections of different domains and languages. The experiments distinguish a subgroup of tested measures that produce topranked bigrams, which demonstrate significant improvement of topic models quality for all collections, when integrated into PLSASIM algorithm.

Related Papers

Topic Models: Accounting Component Structure of Bigrams(2015)
→ Discarding impossible events from statistical language models(2000)1 cited
Back-off bigram을 이용한 대용량 연속어의 화자적응에 관한 연구(2003)
→ A Corpus-Based Study of the Rate of Changes in Frequency of Syntactic Bigrams in English and Russian(2019)