Improving deep neural networks for LVCSR using rectified linear units and dropout
Citations Over TimeTop 1% of 2013 papers
Abstract
Recently, pre-trained deep neural networks (DNNs) have outperformed traditional acoustic models based on Gaussian mixture models (GMMs) on a variety of large vocabulary speech recognition benchmarks. Deep neural nets have also achieved excellent results on various computer vision tasks using a random “dropout” procedure that drastically improves generalization error by randomly omitting a fraction of the hidden units in all layers. Since dropout helps avoid over-fitting, it has also been successful on a small-scale phone recognition task using larger neural nets. However, training deep neural net acoustic models for large vocabulary speech recognition takes a very long time and dropout is likely to only increase training time. Neural networks with rectified linear unit (ReLU) non-linearities have been highly successful for computer vision tasks and proved faster to train than standard sigmoid units, sometimes also improving discriminative performance. In this work, we show on a 50-hour English Broadcast News task that modified deep neural networks using ReLUs trained with dropout during frame level training provide an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improvement over a strong GMM/HMM system. We were able to obtain our results with minimal human hyper-parameter tuning using publicly available Bayesian optimization code.
Related Papers
- → Efficient digital implementation of the sigmoid function for reprogrammable logic(2003)173 cited
- → Remarks on neural network controller using different sigmoid functions(2002)4 cited
- → Describing function approximation for biomedical engineering applications(2004)3 cited
- The nlogistic-sigmoid function(2020)
- → A Methodology for Increasing Precision for the Efficient Implementation of Sigmoid Function(2023)