Speaker Segmentation and Adaptation for Speech Recognition on Multiple-Speaker Audio Conference Data
2007Vol. 4299 2006, pp. 192–195
Citations Over TimeTop 10% of 2007 papers
Abstract
In this paper, we address the problem of how to improve the automatic speech recognition (ASR) performance on audio conference data by speaker segmentation and speaker adaptation. A new speaker segmentation method is proposed, where the speaker turns and speaker labels are automatically determined. For speaker adaptation, we use Vocal Tract Length Normalization and Maximum Likelihood Linear Regression. On a corpus of multi-speaker teleconferences, the word error rate of the ASR system improves over 4% absolute.
Related Papers
- → Never-ending learning system for on-line speaker diarization(2007)28 cited
- → Leveraging speaker diarization for meeting recognition from distant microphones(2010)19 cited
- → Unsupervised text independent speaker classification(2002)15 cited
- → Speaker Recognition and Diarization(2010)3 cited
- → Overview of Speaker Identification with Privacy(2012)