0 citations0 references

A Multi-Genre Urdu Broadcast Speech Recognition System

2021pp. 25–30

Citations Over Time

Erbaz Khan, Sahar Rauf, Farah Adeeba, Sarmad Hussain

Abstract

This paper reports the development of a multi-genre Urdu Broadcast (BC) corpus and a Large Vocabulary Continuous Speech Recognition (LVCSR) system. BC speech corpus of 98 hours from 453 speakers is collected and annotated. For acoustic modeling, Time-delay Neural Network (TDNN) is developed with prior Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) training and alignments. For the language model, 3-gram, 4-gram and Recurrent Neural Network (RNN) based models are developed on a text corpus of 188 million words. The developed models are tested on 4.3 hours of unseen BC multi-genre speech dataset and the best Word Error Rate (WER) 18.59% is achieved using RNN based Language Model (LM). Moreover, a detailed word error analysis is carried out to compare the errors made by humans and the Automatic Speech Recognition (ASR) System. The results showed a similar behavior of word misrecognitions by both humans and ASR.

Citations Over Time

Abstract

Related Papers