A method for the extraction of phonetically-rich triphone sentences
Citations Over TimeTop 10% of 2014 papers
Abstract
A method is proposed for compiling a corpus of phonetically-rich triphone sentences; i.e., sentences with a high variety of triphones, distributed in a uniform fashion. Such a corpus is of interest for a wide range of contexts, from automatic speech recognition to speech therapy. We evaluated this method by building phonetically-rich corpora for Brazilian Portuguese. The data employed comes from Wikipedia's dumps, which were converted into plain text, segmented and phonetically transcribed. The method consists of comparing the distance between the triphone distribution of the available sentences to an ideal uniform distribution, with equiprobable triphones. A greedy algorithm was implemented to recognize and evaluate the distance among sentences. A heuristic metric is proposed for pre-selecting sentences for the algorithm, in order to quicken its execution. The results show that, by applying the proposed metric, one can build corpora with more uniform triphone distributions.
Related Papers
- → Using heuristic worked examples to promote inquiry-based learning(2013)56 cited
- → A rail network performance metric to capture passenger experience(2019)6 cited
- → Not so fast, and not so easy: Essentialism doesn't emerge from a simple heuristic(2014)1 cited
- The Practice of Heuristic Method in Specialized Courses(2003)
- Confucian heuristic-teaching method and fostering innovative talents(2008)