SpEx+: A Complete Time Domain Speaker Extraction Network
Citations Over TimeTop 10% of 2020 papers
Abstract
Speaker extraction aims to extract the target speech signal from a multi-talker environment given a target speaker's reference speech.We recently proposed a time-domain solution, SpEx, that avoids the phase estimation in frequency-domain approaches.Unfortunately, SpEx is not fully a time-domain solution since it performs time-domain speech encoding for speaker extraction, while taking frequency-domain speaker embedding as the reference.The size of the analysis window for timedomain and the size for frequency-domain input are also different.Such mismatch has an adverse effect on the system performance.To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+.Specifically, we tie the weights of two identical speech encoder networks, one for the encoder-extractor-decoder pipeline, another as part of the speaker encoder.Experiments show that the SpEx+ achieves 0.8dB and 2.1dB SDR improvement over the state-of-the-art SpEx baseline, under different and same gender conditions on WSJ0-2mix-extr database respectively.
Related Papers
- → PACWON: A parallelizing compiler for workstations on a network(1998)
- Study and Two Types of Typical Usage of DataGrid Web Server Control(2005)
- Achieving Parameter of DBSCAN Based on Datagrid(2010)
- Using DataGrid Control to Realize DataBase of Querying in VB6.0(2000)
- Susquehanna Chorale Spring Concert "Roots and Wings"(2017)