Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SpidR-Adapt: A Universal Speech Representation Model for Few-Shot Adaptation

About

Human infants, with only a few hundred hours of speech exposure, acquire basic units of new languages, highlighting a striking efficiency gap compared to the data-hungry self-supervised speech models. To address this gap, this paper introduces SpidR-Adapt for rapid adaptation to new languages using minimal unlabeled data. We cast such low-resource speech representation learning as a meta-learning problem and construct a multi-task adaptive pre-training (MAdaPT) protocol which formulates the adaptation process as a bi-level optimization framework. To enable scalable meta-training under this framework, we propose a novel heuristic solution, first-order bi-level optimization (FOBLO), avoiding heavy computation costs. Finally, we stabilize meta-training by using a robust initialization through interleaved supervision which alternates self-supervised and supervised objectives. Empirically, SpidR-Adapt achieves rapid gains in phonemic discriminability (ABX) and spoken language modeling (sWUGGY, sBLIMP, tSC), improving over in-domain language models after training on less than 1h of target-language audio, over $100\times$ more data-efficient than standard training. These findings highlight a practical, architecture-agnostic path toward biologically inspired, data-efficient representations. We open-source the training code and model checkpoints at https://github.com/facebookresearch/spidr-adapt.

Mahi Luthra, Jiayi Shen, Maxime Poli, Angelo Ortiz, Yosuke Higuchi, Youssef Benchekroun, Martin Gleize, Charles-Eric Saint-James, Dongyan Lin, Phillip Rust, Angel Villar, Surya Parimi, Vanessa Stark, Rashel Moritz, Juan Pino, Yann LeCun, Emmanuel Dupoux• 2025

Related benchmarks

TaskDatasetResultRank
Spoken Language ModelingEnglish Spoken Language Modeling sWUGGY, sBLIMP, tSC (test)
Accuracy63.6
30
Across-Speaker ABX3 languages (test)
Average ABX Error Rate (w/o 0h)4.93
7
Within-Speaker ABX3 languages (test)
Avg ABX (w/o 0h)3.76
7
Phoneme DiscoveryPhoneme Discovery Benchmark
PNMI0.71
4
Showing 4 of 4 rows

Other info

Follow for update