Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing

About

We introduce Condition-Aware Self-Supervised Learning Representation (CA-SSLR), a generalist conditioning model broadly applicable to various speech-processing tasks. Compared to standard fine-tuning methods that optimize for downstream models, CA-SSLR integrates language and speaker embeddings from earlier layers, making the SSL model aware of the current language and speaker context. This approach reduces the reliance on input audio features while preserving the integrity of the base SSLR. CA-SSLR improves the model's capabilities and demonstrates its generality on unseen tasks with minimal task-specific tuning. Our method employs linear modulation to dynamically adjust internal representations, enabling fine-grained adaptability without significantly altering the original model behavior. Experiments show that CA-SSLR reduces the number of trainable parameters, mitigates overfitting, and excels in under-resourced and unseen tasks. Specifically, CA-SSLR achieves a 10% relative reduction in LID errors, a 37% improvement in ASR CER on the ML-SUPERB benchmark, and a 27% decrease in SV EER on VoxCeleb-1, demonstrating its effectiveness.

Yen-Ju Lu, Jing Liu, Thomas Thebaud, Laureano Moro-Velazquez, Ariya Rastrow, Najim Dehak, Jesus Villalba• 2024

Related benchmarks

TaskDatasetResultRank
Automatic Speech RecognitionML-SUPERB 10-min Normal
CER18.3
26
Language IdentificationML-SUPERB 10-min Normal
LID Accuracy90.2
18
Automatic Speech Recognition10-min ML-SUPERB Few-shots
ASR CER31.6
12
Automatic Speech RecognitionML-SUPERB 1hr Normal
CER14.4
10
Language IdentificationML-SUPERB 1hr Normal
Accuracy93.5
10
Speaker VerificationVoxCeleb 10min context Normal
EER1.04
10
Speaker VerificationVoxCeleb 1hr context Normal
EER0.0094
10
Speaker VerificationVoxCeleb
EER1.15
8
Automatic Speech RecognitionML-SUPERB 10-min Few-shots 1.0
ASR CER33.4
4
Language IdentificationML-SUPERB 10-min Few-shots
LID Acc85.8
4
Showing 10 of 11 rows

Other info

Follow for update