Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs

About

Automatic Speech Recognition (ASR) transcripts often contain disfluencies, such as fillers, repetitions, and false starts, which reduce readability and hinder downstream applications like chatbots and voice assistants. If left unaddressed, such disfluencies can significantly degrade the reliability of downstream systems. Most existing approaches rely on classical models that focus on identifying disfluent tokens for removal. While this strategy is effective to some extent, it often disrupts grammatical structure and semantic coherence, leading to incomplete or unnatural sentences. Recent literature explored the use of large language models (LLMs); however, these efforts have primarily focused on disfluency detection or data augmentation, rather than performing comprehensive correction. We propose a multilingual correction pipeline where a sequence tagger first marks disfluent tokens, and these signals guide instruction fine-tuning of an LLM to rewrite transcripts into fluent text. To further improve reliability, we add a contrastive learning objective that penalizes the reproduction of disfluent tokens, encouraging the model to preserve grammar and meaning while removing disfluent artifacts. Our experiments across three Indian languages, namely Hindi, Bengali, and Marathi show consistent improvements over strong baselines, including multilingual sequence-to-sequence models. These results highlight that detection-only strategies are insufficient. Combining token-level cues with instruction tuning and contrastive learning provides a practical and scalable solution for multilingual disfluency correction in speech-driven NLP systems. We make the codes publicly available at https://github.com/deepak-kumar-98/Mind-the-Pause.

Deepak Kumar, Baban Gain, Asif Ekbal• 2026

Related benchmarks

TaskDatasetResultRank
Disfluency CorrectionHindi Real Data ASR (test)
BLEU91.1
6
Disfluency CorrectionBengali Real Data ASR (test)
BLEU75.9
6
Disfluency CorrectionMarathi Real Data ASR (test)
BLEU84.4
6
Disfluency CorrectionHindi Manually Edited (test)
BLEU96.1
5
Disfluency CorrectionBengali Manually Edited (test)
BLEU96.4
5
Disfluency CorrectionMarathi Manually Edited (test)
BLEU95.1
5
Disfluency CorrectionHindi Real Data ASR
BLEU90.4
4
Disfluency CorrectionBengali Manually Edited
BLEU94.8
4
Disfluency CorrectionMarathi Real Data ASR
BLEU83.6
4
Disfluency CorrectionHindi Manually Edited
Proposed (%)18.1
2
Showing 10 of 15 rows

Other info

Follow for update