Contextual Spelling Correction with Large Language Models
Citations Over TimeTop 18% of 2023 papers
Abstract
Contextual Spelling Correction (CSC) models are used to improve automatic speech recognition (ASR) quality given userspecific context. Typically, context is modeled as a large set of text spans to compare against a given ASR hypothesis using some distance measure (text, phonetic, or neural embedding). In this work we propose a CSC system based on a single Large Language Model (LLM) adapted with prompt tuning. Our approach is shown to be data efficient, and does not require dedicated serving. Our system exhibits advanced contextualization capabilities, such as support for phonetic spellings, cross-lingual scripts, and context specified as topics, with little to no data engineering. On voice assistant datasets, our system achieves $7.8 \%$ absolute word error rate reduction from a reference ASR system with relevant context and improving upon other contextualization solutions. Finally, we test our system in a prompt-injection attack scenario and report vulnerabilities and mitigations.
Related Papers
- → Incorporating language constraints in sub-word based speech recognition(2005)38 cited
- → Capitalization Normalization for Language Modeling with an Accurate and Efficient Hierarchical RNN Model(2022)7 cited
- → Improved language modelling through better language model evaluation measures(2001)11 cited
- → Speech recognition experiments using multi-span statistical language models(1999)6 cited
- → Deep Learning Based Language Modeling for Domain-Specific Speech Recognition(2017)1 cited