Deep neural networks for small footprint text-dependent speaker verification
Citations Over TimeTop 1% of 2014 papers
Abstract
In this paper we investigate the use of deep neural networks (DNNs) for a small footprint text-dependent speaker verification task. At development stage, a DNN is trained to classify speakers at the framelevel. During speaker enrollment, the trained DNN is used to extract speaker specific features from the last hidden layer. The average of these speaker features, or d-vector, is taken as the speaker model. At evaluation stage, a d-vector is extracted for each utterance and compared to the enrolled speaker model to make a verification decision. Experimental results show the DNN based speaker verification system achieves good performance compared to a popular i-vector system on a small footprint text-dependent speaker verification task. In addition, the DNN based system is more robust to additive noise and outperforms the i-vector system at low False Rejection operating points. Finally the combined system outperforms the i-vector system by 14% and 25% relative in equal error rate (EER) for clean and noisy conditions respectively.
Related Papers
- → Fast discriminative speaker verification in the i-vector space(2011)46 cited
- → STC-Innovation Speaker Recognition Systems for Far-Field Speaker Verification Challenge 2020(2020)9 cited
- → Speaker verification by inexperienced and experienced listeners vs. speaker verification system(2011)10 cited
- → Development of a Novel System for Speaker Verification(2020)2 cited
- → Text-independent speaker verification using speaker clustering and support vector machines(2003)6 cited