Towards Better Decoding and Language Model Integration in Sequence to Sequence Models
Citations Over TimeTop 1% of 2017 papers
Abstract
The recently proposed Sequence-to-Sequence (seq2seq) framework advocates replacing complex data processing pipelines, such as an entire automatic speech recognition system, with a single neural network trained in an end-to-end fashion.In this contribution, we analyse an attention-based seq2seq speech recognition system that directly transcribes recordings into characters.We observe two shortcomings: overconfidence in its predictions and a tendency to produce incomplete transcriptions when language models are used.We propose practical solutions to both problems achieving competitive speaker independent word error rates on the Wall Street Journal dataset: without separate language models we reach 10.6% WER, while together with a trigram language model, we reach 6.7% WER.
Related Papers
- → On Achievable Rates for Relay Channels(2007)5 cited
- A Study of Joint Equalization and TCM Decoding with Multiple Decoding Depths(2006)
- → An optimized Inactivation Decoding of BATS Codes(2021)
- → An enhanced BP secondary decoding algorithm(2022)
- → Decoding: Making Predictions(2011)