Spherediar: An Effective Speaker Diarization System for Meeting Data
2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)2019Vol. 12, pp. 373–380
Citations Over Time
Abstract
In this paper, we present SphereDiar, a speaker diarization system composed of three novel subsystems: the Sphere-Speaker (SS) neural network, designed for speaker embedding extraction, a segmentation method called Homogeneity Based Segmentation (HBS) and a clustering algorithm called Top Two Silhouettes (Top2S). The system is evaluated on a set of over 200 manually transcribed multiparty meetings. The evaluation reveals that the system can be further simplified by omitting the use of HBS. Furthermore, we illustrate that SphereDiar achieves state-of-the-art results with two different meeting data sets.
Related Papers
- → Fusion of Acoustic and Prosodic Features for Speaker Clustering(2009)9 cited
- → Use of Vocal Source Features in Speaker Segmentation(2006)16 cited
- → Speaker segmentation using parallel fusion between three classifiers(2009)2 cited
- → Benefits of prior acoustic segmentation for automatic speaker segmentation(2004)17 cited
- → On the Influence of Automatic Segmentation and Clustering in Automatic Speech Recognition(2012)