Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Text-only adaptation in LLM-based ASR through text denoising

About

Adapting large language model (LLM)-based automatic speech recognition (ASR) systems to new domains using text-only data is a significant yet underexplored challenge. Standard fine-tuning of the LLM on the target domain text often disrupts the critical alignment between the speech and text modality learned by the projector, degrading performance. We introduce a novel text-only adaptation method that frames this process as a text denoising task. Our approach trains the LLM to recover clean transcripts from noisy inputs. This process effectively adapts the model to a target domain while preserving cross-modal alignment. Our solution is lightweight, requiring no architectural changes or additional parameters. Extensive evaluation on two datasets demonstrates up to 22.1% relative improvement, outperforming recent state-of-the-art text-only adaptation methods.

Andr\'es Carofilis, Sergio Burdisso, Esa\'u Villatoro-Tello, Shashi Kumar, Kadri Hacioglu, Srikanth Madikeri, Pradeep Rangappa, Manjunath K E, Petr Motlicek, Shankar Venkatesan, Andreas Stolcke• 2026

Related benchmarks

TaskDatasetResultRank
Automated Speech RecognitionSlideSpeech Ag
WER14.21
10
Automated Speech RecognitionSlideSpeech MI
WER0.1343
10
Automated Speech RecognitionSlideSpeech An
WER25.32
5
Automatic Speech RecognitionDefinedAI Banking target domain (test)
WER10.11
5
Automatic Speech RecognitionDefinedAI Insurance target domain (test)
WER8.71
5
Automatic Speech RecognitionSlideSpeech target
WER14.6
5
Showing 6 of 6 rows

Other info

Follow for update