Bringing contextual information to google speech recognition
Citations Over TimeTop 10% of 2015 papers
Abstract
In automatic speech recognition on mobile devices, very often what a user says strongly depends on the particular context he or she is in. The n-grams relevant to the context are often not known in advance. The context can depend on, for example, particular dialog state, options presented to the user, conversation topic, location, etc. Speech recognition of sentences that include these n-grams can be challenging, as they are often not well represented in a language model (LM) or even include out-of-vocabulary (OOV) words. In this paper, we propose a solution for using contextual information to improve speech recognition accuracy. We utilize an on-the-fly rescoring mechanism to adjust the LM weights of a small set of n-grams relevant to the particular context during speech decoding. Our solution handles out of vocabulary words. It also addresses efficient combination of multiple sources of context and it even allows biasing class based language models. We show significant speech recognition accuracy improvements on several datasets, using various types of contexts, without negatively impacting the overall system. The improvements are obtained in both offline and live experiments.
Related Papers
- → An Agent-Based Dialog Simulation Technique to Develop and Evaluate Conversational Agents(2011)4 cited
- → A Methodology for Learning Optimal Dialog Strategies(2010)4 cited
- → Optimizing Dialog Strategies for Conversational Agents Interacting in AmI Environments(2012)1 cited
- Dialog's finder files: What will you find? What will you miss?(1992)
- → Specialized Dialog Boxes(2014)