Convolutional neural networks for acoustic modeling of raw time signal in LVCSR
Citations Over TimeTop 1% of 2015 papers
Abstract
In this paper we continue to investigate how the deep neural network (DNN) based acoustic models for automatic speech recognition can be trained without hand-crafted feature extraction. Previously, we have shown that a simple fully connected feedforward DNN performs surprisingly well when trained directly on the raw time signal. The analysis of the weights revealed that the DNN has learned a kind of short-time time-frequency decomposition of the speech signal. In conventional feature extraction pipelines this is done manually by means of a filter bank that is shared between the neighboring analysis windows. Following this idea, we show that the performance gap between DNNs trained on spliced hand-crafted features and DNNs trained on raw time signal can be strongly reduced by introducing 1D-convolutional layers. Thus, the DNN is forced to learn a short-time filter bank shared over a longer time span. This also allows us to interpret the weights of the second convolutional layer in the same way as 2D patches learned on critical band energies by typical convolutional neural networks. The evaluation is performed on an English LVCSR task. Trained on the raw time signal, the convolutional layers allow to reduce the WER on the test set from 25.5% to 23.4%, compared to an MFCC based result of 22.1% using fully connected layers. Index Terms: acoustic modeling, raw time signal, convolutional neural networks
Related Papers
- → An Adaptable Real-Time Object Detection for Traffic Surveillance using R-CNN over CNN with Improved Accuracy(2022)12 cited
- → The preliminary study of artificial intelligence based on convolutional neural network as a corrosion detection tool on ship structures(2023)5 cited
- → Age Estimation Method based on Comparative Convolutional Neural Network using Inception Module(2018)2 cited
- → SEGMENTATION OF MEDICAL IMAGES BY CONVOLUTIONAL NEURAL NETWORKS(2022)1 cited
- → Convolutional Neural Network (CNN) Applied to the Risk Analysis of Accidents in Vessels Navigating the Amazon Rivers(2023)