0 citations0 references

Bringing contextual information to google speech recognition

2015pp. 468–472

Citations Over TimeTop 10% of 2015 papers

Petar Aleksic, Mohammadreza Ghodsi, Assaf Michaely, Cyril Allauzen, Keith Hall, Brian Roark, David Rybach, Pedro J. Moreno

Abstract

In automatic speech recognition on mobile devices, very often what a user says strongly depends on the particular context he or she is in. The n-grams relevant to the context are often not known in advance. The context can depend on, for example, particular dialog state, options presented to the user, conversation topic, location, etc. Speech recognition of sentences that include these n-grams can be challenging, as they are often not well represented in a language model (LM) or even include out-of-vocabulary (OOV) words. In this paper, we propose a solution for using contextual information to improve speech recognition accuracy. We utilize an on-the-fly rescoring mechanism to adjust the LM weights of a small set of n-grams relevant to the particular context during speech decoding. Our solution handles out of vocabulary words. It also addresses efficient combination of multiple sources of context and it even allows biasing class based language models. We show significant speech recognition accuracy improvements on several datasets, using various types of contexts, without negatively impacting the overall system. The improvements are obtained in both offline and live experiments.

Related Papers

→ An Agent-Based Dialog Simulation Technique to Develop and Evaluate Conversational Agents(2011)4 cited
→ A Methodology for Learning Optimal Dialog Strategies(2010)4 cited
→ Optimizing Dialog Strategies for Conversational Agents Interacting in AmI Environments(2012)1 cited
Dialog's finder files: What will you find? What will you miss?(1992)
→ Specialized Dialog Boxes(2014)