0 citations0 references

Improved topic-dependent language modeling using information retrieval techniques

1999pp. 541–544 vol.1

Citations Over TimeTop 10% of 1999 papers

M. Mahajan, Doug Beeferman, Xuedong Huang

Abstract

N-gram language models are frequently used by the speech recognition systems to constrain and guide the search. N-gram models use only the last N-1 words to predict the next word. Typical values of N that are used range from 2-4. N-gram language models thus lack the long-term context information. We show that the predictive power of the N-gram language models can be improved by using long-term context information about the topic of discussion. We use information retrieval techniques to generalize the available context information for topic-dependent language modeling. We demonstrate the effectiveness of this technique by performing experiments on the Wall Street Journal text corpus, which is a relatively difficult task for topic-dependent language modeling since the text is relatively homogeneous. The proposed method can reduce the perplexity of the baseline language model by 37%, indicating the predictive power of the topic-dependent language model.

Related Papers

→ Improving N-gram language modeling for code-switching speech recognition(2017)16 cited
→ Improved topic-dependent language modeling using information retrieval techniques(1999)55 cited
→ An Anatomization of Language Detection and Translation using NLP Techniques(2020)4 cited
→ Improving language models by using distant information(2007)3 cited
An Anatomization Of Language Detection And Translation Using NLP Techniques(2021)