0 citations0 references

Improving CTC-Based Speech Recognition Via Knowledge Transferring from Pre-Trained Language Models

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)2022pp. 8517–8521

Citations Over TimeTop 10% of 2022 papers

Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Xu Ji, Pengyuan Zhang

Abstract

Recently, end-to-end automatic speech recognition models based on connectionist temporal classification (CTC) have achieved impressive results, especially when fine-tuned from wav2vec2.0 models. Due to the conditional independence assumption, CTC-based models are always weaker than attention-based encoder-decoder models and require the assistance of external language models (LMs). To solve this issue, we propose two knowledge transferring methods that leverage pre-trained LMs, such as BERT and GPT2, to improve CTC-based models. The first method is based on representation learning, in which the CTC-based models use the representation produced by BERT as an auxiliary learning target. The second method is based on joint classification learning, which combines GPT2 for text modeling with a hybrid CTC/attention architecture. Experiment on AISHELL-1 corpus yields a character error rate (CER) of 4.2% on the test set. When compared to the vanilla CTC-based models fine-tuned from the wav2vec2.0 models, our knowledge transferring method reduces CER by 16.1% relatively without external LMs.

Related Papers

→ Connectionist Models(2009)3 cited
→ Connectionism in Linguistic Theory(2017)2 cited
The Localist and the Distributed Models of Connectionism(2014)
→ Conceptions and misconceptions of connectionism(2003)
→ Connectionist approaches to development(1996)