Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting
About
We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction. Our first focus is on instruction prompting to let LLMs perform these task without fine-tuning, for which we evaluate different prompting schemes, both zero- and few-shot in-context learning, and a novel task activation prompting method that combines causal instructions and demonstration to increase its context windows. Next, we show that rescoring only by in-context learning with frozen LLMs achieves results that are competitive with rescoring by domain-tuned LMs, using a pretrained first-pass recognition system and rescoring output on two out-of-domain tasks (ATIS and WSJ). By combining prompting techniques with fine-tuning we achieve error rates below the N-best oracle level, showcasing the generalization power of the LLMs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Automatic Speech Recognition | LibriSpeech (test-other) | WER4.84 | 966 | |
| Automatic Speech Recognition | LibriSpeech clean (test) | WER2.92 | 833 | |
| Automatic Speech Recognition | GigaSpeech (test) | WER12.1 | 40 | |
| Speech Recognition | VoxPopuli (test) | WER7.49 | 37 | |
| ASR rescoring | WSJ (test) | WER8.72 | 35 | |
| Automatic Speech Recognition | AMI (test) | Word Error Rate22.91 | 24 | |
| Automatic Speech Recognition | Spgispeech (test) | WER3.94 | 19 | |
| Automatic Speech Recognition | TED-LIUM (test) | WER6.09 | 19 | |
| ASR rescoring | ATIS (test) | WER6.39 | 11 | |
| ASR Error Correction | ASR Error Correction Evaluation Set (test) | WER16.62 | 6 |