Automatic selection of transcribed training material
2005pp. 417–420
Abstract
Conventional wisdom says that incorporating more training data is the surest way to reduce the error rate of a speech recognition system. This, in turn, guarantees that speech recognition systems are expensive to train, because of the high cost of annotating training data. We propose an iterative training algorithm that seeks to improve the error rate of a speech recognizer without incurring additional transcription cost, by selecting a subset of the already available transcribed training data. We apply the proposed algorithm to an alpha-digit recognition problem and reduce the error rate from 10.3% to 9.4% on a particular test set.
Related Papers
- → Issues in automatic transcription of historical audio data(2002)3 cited
- → Effective balancing error and user effort in interactive handwriting recognition(2013)2 cited
- → Training Simulation with Nothing but Training Data - Simulating Performance based on Training Data Without the Help of Performance Diagnostics in a Laboratory(2016)1 cited
- → HTEC: Human Transcription Error Correction(2023)
- → ShiftCrypt training data set for training ConforMine(2023)