Low‐latency transformer model for streaming automatic speech recognition
Citations Over Time
Abstract
Abstract Transformer models have made great progress in automatic speech recognition. However, it is challenging for streaming transformer models to make trade‐off between output latency and recognition accuracy. In this letter, it is aimed to propose a low‐latency transformer model with satisfactory recognition accuracy. First, a streaming transformer is designed and explain how it works streamingly. Second, the authors propose to use CTC during training to minimise the latency of transformer models. Finally, the authors also propose to utilise CTC as a backup during decoding to ensure that the low‐latency characteristic is maintained. The authors fairly compare our streaming transformer model to existing streaming models, particularly the transducer model, which is a popular low‐latency approach. The experiments show that, while having comparable output latency, the transformer model outperforms the transducer model by average relative character (or word) error rate reduction of 22.18%, 26.71% and 19.36% on HKUST, Switchboard and Call Home, respectively.
Related Papers
- → ESKVS: efficient and secure approach for keyframes-based video summarization framework(2024)9 cited
- Using DataGrid Control to Realize DataBase of Querying in VB6.0(2000)
- Susquehanna Chorale Spring Concert "Roots and Wings"(2017)
- → DETERMINING QUALITY REQUIREMENTS AT THE UNIVERSITIES TO IMPROVE THE QUALITY OF EDUCATION(2018)
- → ИСПОЛЬЗОВAНИЕ ПОТЕНЦИAЛA СОЦИAЛЬНЫХ ПAРТНЕРОВ В ПОДГОТОВКЕ БУДУЩИХ ПЕДAГОГОВ(2024)