Discarding impossible events from statistical language models
Abstract
This paper describes a method for detecting impossible bigrams from a space of ¢ ¤£ bigrams where ¢ is the size of the vocabulary.The idea is to discard all the ungrammatical events which are impossible in a well written text and consequently to expect an improvement of the language model.We expect also, in speech recognition, to reduce the complexity of the search algorithm by making less comparisons.To achieve that, we extract the impossible bigrams by using automatic rules.These rules are based on grammatical classes.The biclass associations which are ungrammatical are detected and all the corresponding bigrams are analyzed and set as possible or impossible events.Because, in natural language, grammatical rules can have exceptions, we decided to manage for each of the retrieved rules an exception list.
Related Papers
- → Class phrase models for language modeling(2002)45 cited
- → Speech recognition using a stochastic language model integrating local and global constraints(1994)9 cited
- → Evaluation of a language model using a clustered model backoff(2002)8 cited
- Modeling of term-distance and term-occurrence information for improving n-gram language model performance(2013)
- → Improving language models by using distant information(2007)3 cited