End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results
arXiv (Cornell University)2014
Citations Over Time
Abstract
We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context created with a subset of input symbols elected by the attention mechanism. We report initial results demonstrating that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.
Related Papers
- → Towards Constructing HMM Structure for Speech Recognition With Deep Neural Fenonic Baseform Growing(2021)3 cited
- → Hidden neural networks: a framework for HMM/NN hybrids(2002)25 cited
- → On the phonetic structure of a large hidden Markov model(1991)8 cited
- → Leveraging End-to-End Speech Recognition with Neural Architecture Search(2019)8 cited