LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition
Citations Over TimeTop 10% of 2016 papers
Abstract
Popularized by the long short-term memory (LSTM), multiplicative gates have become a standard means to design artificial neural networks with intentionally organized information flow.Notable examples of such architectures include gated recurrent units (GRU) and highway networks.In this work, we first focus on the evaluation of each of the classical gated architectures for language modeling for large vocabulary speech recognition.Namely, we evaluate the highway network, lateral network, LSTM and GRU.Furthermore, the motivation underlying the highway network also applies to LSTM and GRU.An extension specific to the LSTM has been recently proposed with an additional highway connection between the memory cells of adjacent LSTM layers.In contrast, we investigate an approach which can be used with both LSTM and GRU: a highway network in which the LSTM or GRU is used as the transformation function.We found that the highway connections enable both standalone feedforward and recurrent neural language models to benefit better from the deep structure and provide a slight improvement of recognition accuracy after interpolation with count models.To complete the overview, we include our initial investigations on the use of the attention mechanism for learning word triggers.
Related Papers
- → Playing Chess Bit by Bit(1983)1 cited
- In celebration of Connecticut's 20th anniversary of mandated fluoridation--1965-1985 with: a bit about its history. A bit about those who opposed it. A bit more on bills introduced into our General Assembly from 1951 to 1985.(1985)
- → Bit by Bit: How Video Games Transformed Our World by Andrew Ervin(2018)
- → Composing your first exploration paper- Here is a bit by bit guide for you(2021)
- → A Study on the Jurisdiction of Arbitral Tribunals in Bilateral Investment Treaties(2023)