0 citations0 references

Deep Context: End-to-end Contextual Speech Recognition

2018pp. 418–425

Citations Over TimeTop 10% of 2018 papers

Golan Pundak, Tara N. Sainath, Rohit Prabhavalkar, Anjuli Kannan, Ding Zhao

Abstract

In automatic speech recognition (ASR) what a user says depends on the particular context she is in. Typically, this context is represented as a set of word n-grams. In this work, we present a novel, all-neural, end-to-end (E2E) ASR system that utilizes such context. Our approach, which we refer to as Contextual Listen, Attend and Spell (CLAS) jointly-optimizes the ASR components along with embeddings of the context n-grams. During inference, the CLAS system can be presented with context phrases which might contain-of-vocabulary (OOV) terms not seen during training. We compare our proposed system to a more traditional contextualization approach, which performs shallow-fusion between independently trained LAS and contextual n-gram models during beam search. Across a number of tasks, we find that the proposed CLAS system outperforms the baseline method by as much as 68% relative WER, indicating the advantage of joint optimization over individually trained components.

Related Papers

→ Contextualization That is Comprehensive(2006)8 cited
Worldview, Challenge of Contextualization and Church Planting in West Africa – Part 1: Definition of Worldview and the Historical Development of the Concept(2010)
The Trinity and Contextualization(2010)
→ Research on Influence of Contextualization on Difficulty of Test Questions(2019)
→ Contextualization Mission of Paul in the Book of Acts(2021)