Automatic language identification using support vector machines and phonetic N-gram
Citations Over TimeTop 20% of 2008 papers
Abstract
In this paper, we describe two approaches for language identification (LID) using support vector machines (SVM) and phonetic n-gram. One is to use the language model scores of phone sequences to do SVM training. The other is to use the n-gram probabilities of those phones to train SVM models. For the second approach, we propose a new effective normalization method. In the experiments of 30 s test for 5 languages, our new normalization method shows a relative reduction of 15.8% in terms of equal error rate (EER) compared with the traditional one. And it makes the system using the second approach reaches an EER of 2.4%, a relative reduction of about 35.5% in comparison with the first one. Details of implementation and experimental results are presented in this paper.
Related Papers
- → Automatic Synonym Acquisition Using a Context-Restricted Skip-gram Model(2017)2 cited
- → Experimental Study of Higher-gram Index Length for N-gram Full Text Search System(2006)
- NGRAM: Stata module to provide n-gram feature extractor(2018)
- → DNA N-gram Analysis Framework (DNAnamer): A generalized N-gram frequency analysis framework for the supervised classification of DNA sequences(2024)