Unit Selection Synthesis Based Data Augmentation for Fixed Phrase Speaker Verification
Citations Over TimeTop 22% of 2021 papers
Abstract
Data augmentation is commonly used to help build a robust speaker verification system, especially in limited-resource case. However, conventional data augmentation methods usually focus on the diversity of acoustic environment, leaving the lexicon variation neglected. For text dependent speaker verification tasks, it’s well-known that preparing training data with the target transcript is the most effectual approach to build a well-performing system, however collecting such data is time-consuming and expensive. In this work, we propose a unit selection synthesis based data augmentation method to leverage the abundant text-independent data resources. In this approach text-independent speeches of each speaker are firstly broke up to speech segments each contains one phone unit. Then segments that contain phonetics in the target transcript are selected to produce a speech with the target transcript by concatenating them in turn. Experiments are carried out on the AISHELL Speaker Verification Challenge 2019 database, the results and analysis shows that our proposed method can boost the system performance significantly.
Related Papers
- → Leveraging speaker diarization for meeting recognition from distant microphones(2010)19 cited
- → Imposture classification for text-dependent speaker verification(2014)10 cited
- → Text-independent speaker verification using speaker clustering and support vector machines(2003)6 cited
- → Speaker Recognition and Diarization(2010)3 cited
- → Using nonstandard SVM for combination of speaker verification and verbal information verification in speaker authentication system(2002)