Low latency sound source separation using convolutional recurrent neural networks
Citations Over TimeTop 10% of 2017 papers
Abstract
Deep neural networks (DNN) have been successfully employed for the problem of monaural sound source separation achieving state-of-the-art results. In this paper, we propose using convolutional recurrent neural network (CRNN) architecture for tackling this problem. We focus on a scenario where low algorithmic delay (<; 10 ms) is paramount, and relatively little training data is available. We show that the proposed architecture can achieve slightly better performance as compared to feedforward DNNs and long short-term memory (LSTM) networks. In addition to reporting separation performance metrics (i.e., source to distortion ratios), we also report extended short term objective intelligibility (ESTOI) scores which better predict intelligibility performance in presence of non-stationary interferers.
Related Papers
- → Study on Method of Estimating Direction of Arrival Using Monaural Modulation Spectrum(2014)3 cited
- → Monaural Temporary Threshold Shift following Monaural and Binaural Exposures(1958)7 cited
- Monaural and binaural detection thresholds of amplitude modulation(2004)
- → A biologically inspired binaural approach to monaural modeling(2005)1 cited
- → Temporary Threshold Shifts Following Pulsed Monaural and Alternate Binaural Exposure(1969)1 cited