0 citations0 references

Multistate Encoding with End-To-End Speech RNN Transducer Network

2020Vol. 15, pp. 7819–7823

Citations Over TimeTop 20% of 2020 papers

Zelin Wu, Bo Li, Yu Zhang, Petar Aleksic, Tara N. Sainath

Abstract

Recurrent Neural Network Transducer (RNN-T) models [1] for automatic speech recognition (ASR) provide high accuracy speech recognition. Such end-to-end (E2E) models combine acoustic, pronunciation and language models (AM, PM, LM) of a conventional ASR system into a single neural network, dramatically reducing complexity and model size.In this paper, we propose a technique for incorporating contextual signals, such as intelligent assistant device state or dialog state, directly into RNN-T models. We explore different encoding methods and demonstrate that RNN-T models can effectively utilize such context. Our technique results in reduction in Word Error Rate (WER) of up to 10.4% relative on a variety of contextual recognition tasks. We also demonstrate that proper regularization can be used to model context independently for improved overall quality.

Citations Over TimeTop 20% of 2020 papers

Abstract

Related Papers