Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Group-Aware Partial Model Merging for Children's Automatic Speech Recognition

About

Automatic Speech Recognition (ASR) for children remains challenging, primarily due to large acoustic variability and limited availability of training data. While supervised fine-tuning of adult pre-trained models has shown promise, it often fails to capture group-specific characteristics variations among children. To address this, we introduce GRoup-Aware PARtial model Merging (GRAPAM), a parameter-efficient approach that combines unsupervised clustering, partial fine-tuning, and model merging. Our approach adapts adult-pre-trained models to children by first grouping the children's data based on acoustic similarity. Each group is used to partially fine-tune an adult pre-trained model, and the resulting models are merged at the parameter level. Experiments conducted on the MyST children's speech corpus indicate that GRAPAM achieves a relative improvement of 6% of Word Error Rate (WER), using the same amount of data, outperforming full fine-tuning while training fewer parameters. These results highlight the promise of model merging as a scalable and effective strategy for children's ASR.

Thomas Rolland, Alberto Abad• 2025

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionMyST children's speech corpus (test)
WER9.36
54
Showing 1 of 1 rows

Other info

Follow for update