Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR

About

Model merging is a scalable alternative to multi-task training that combines the capabilities of multiple specialised models into a single model. This is particularly attractive for large speech foundation models, which are typically adapted through domain-specific fine-tuning, resulting in multiple customised checkpoints, for which repeating full fine-tuning when new data becomes available is computationally prohibitive. In this work, we study model merging for multi-domain ASR and benchmark 11 merging algorithms for 10 European Portuguese domains, evaluating in-domain accuracy, robustness under distribution shift, as well as English and multilingual performance. We further propose BoostedTSV-M, a new merging algorithm based on TSV-M that mitigates rank collapse via singular-value boosting and improves numerical stability. Overall, our approach outperforms full fine-tuning on European Portuguese while preserving out-of-distribution generalisation in a single model.

Carlos Carvalho, Francisco Teixeira, Thomas Rolland, Alberto Abad• 2026

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionFleurs--
56
Automatic Speech RecognitionEuropean Portuguese (EP) Full Avg.
WER11.55
16
Automatic Speech RecognitionEuropean Portuguese (EP) ID
WER9.27
16
Automatic Speech RecognitionEuropean Portuguese OOD
WER16.11
16
Automatic Speech RecognitionAfrican-accented Portuguese (AAP)
WER21.58
16
Automatic Speech RecognitionOpenASR-HF
WER7.6
16
Automatic Speech RecognitionBrazilian Portuguese
WER24.98
16
Showing 7 of 7 rows

Other info

Follow for update