Exploring the potential and limitations of Model Merging for Multi-Domain Adaptation in ASR

About

Model merging is a scalable alternative to multi-task training that combines the capabilities of multiple specialised models into a single model. This is particularly attractive for large speech foundation models, which are typically adapted through domain-specific fine-tuning, resulting in multiple customised checkpoints, for which repeating full fine-tuning when new data becomes available is computationally prohibitive. In this work, we study model merging for multi-domain ASR and benchmark 11 merging algorithms for 10 European Portuguese domains, evaluating in-domain accuracy, robustness under distribution shift, as well as English and multilingual performance. We further propose BoostedTSV-M, a new merging algorithm based on TSV-M that mitigates rank collapse via singular-value boosting and improves numerical stability. Overall, our approach outperforms full fine-tuning on European Portuguese while preserving out-of-distribution generalisation in a single model.

Carlos Carvalho, Francisco Teixeira, Thomas Rolland, Alberto Abad• 2026

Related benchmarks

Task	Dataset	Result
Automatic Speech Recognition	Fleurs	--	56
Automatic Speech Recognition	European Portuguese (EP) Full Avg.	WER11.55	16
Automatic Speech Recognition	European Portuguese (EP) ID	WER9.27	16
Automatic Speech Recognition	European Portuguese OOD	WER16.11	16
Automatic Speech Recognition	African-accented Portuguese (AAP)	WER21.58	16
Automatic Speech Recognition	OpenASR-HF	WER7.6	16
Automatic Speech Recognition	Brazilian Portuguese	WER24.98	16

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord