Learning the Front-End Speech Feature with Raw Waveform for End-to-End Speaker Recognition
Citations Over Time
Abstract
State-of-the-art deep neural network-based speaker recognition systems tend to follow the paradigm of speech feature extraction and then the speaker classifier training, namely "divide and conquer" approaches. These methods usually rely on fixed, handcrafted features such as Mel frequency cepstral coefficients (MFCCs) to preprocess the waveform before the classification pipeline. In this paper, inspired by the success and promising work to model a system directly from the raw speech signal for applications such as audio speech recognition, anti-spoofing and emotion recognition, we present an end-to-end speaker recognition system, combining front-end raw waveform feature extractor, back-end speaker embedding classifier and angle-based loss optimizer. Specifically, this means that the proposed frontend raw waveform feature extractor builds on a trainable alternative for MFCCs without modification of the acoustic model. And we will detail the superiority of the raw waveform feature extractor, namely utilizing the time convolution layer to reduce temporal variations aiming to adaptively learn a front-end speech feature representation by supervised training together with the rest of classification model. Our experiments, conducted on CSTR VCTK Corpus dataset, demonstrate that the proposed end-to-end speaker recognition system can achieve state-of-the-art performance compared to baseline models.
Related Papers
- → Using Deep Gated RNN with a Convolutional Front End for End:to:End Classification of Heart Sound(2016)46 cited
- → A comparative between Mel Frequency Cepstral Coefficients (MFCC) and Inverse Mel Frequency Cepstral Coefficients (IMFCC) features for an Automatic Bird Species Recognition System(2018)25 cited
- → Does End-to-End Trained Deep Model Always Perform Better than Non-End-to-End Counterpart?(2021)2 cited
- Cough Sound Identification Based on Improved MFCC and Short-time Energy(2012)
- → Prototype2Code: End-to-end Front-end Code Generation from UI Design Prototypes(2024)2 cited