Learning long-term dependencies in NARX recurrent neural networks
Citations Over TimeTop 10% of 1996 papers
Abstract
It has previously been shown that gradient-descent learning algorithms for recurrent neural networks can perform poorly on tasks that involve long-term dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. We show that the long-term dependencies problem is lessened for a class of architectures called nonlinear autoregressive models with exogenous (NARX) recurrent neural networks, which have powerful representational capabilities. We have previously reported that gradient descent learning can be more effective in NARX networks than in recurrent neural network architectures that have "hidden states" on problems including grammatical inference and nonlinear system identification. Typically, the network converges much faster and generalizes better than other networks. The results in this paper are consistent with this phenomenon. We present some experimental results which show that NARX networks can often retain information for two to three times as long as conventional recurrent neural networks. We show that although NARX networks do not circumvent the problem of long-term dependencies, they can greatly improve performance on long-term dependency problems. We also describe in detail some of the assumptions regarding what it means to latch information robustly and suggest possible ways to loosen these assumptions.
Related Papers
- → Comparison of NARX Neural Network and Classical Modelling Approaches(2014)16 cited
- → A Novel Modeling Method for Aircraft Engine Using Nonlinear Autoregressive Exogenous (NARX) Models Based on Wavelet Neural Networks(2017)12 cited
- → Forecasting System Marginal Price Using Multilayer Perceptron and Nonlinear Autoregressive exogenous model(2020)3 cited
- → Comparison Between Non-Linear Autoregressive and Non-Linear Autoregressive with Exogeneous Inputs Models for Predicting Cardiac Ischemic Beats(2020)
- → Short-term forecast the dynamics of changes in the surface concentration of methane using a non-linear autoregressive neural network with external input and vector autoregression model(2022)