Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DUAP: Dual-task Universal Adversarial Perturbations Against Voice Control Systems

About

Modern Voice Control Systems (VCS) rely on the collaboration of Automatic Speech Recognition (ASR) and Speaker Recognition (SR) for secure interaction. However, prior adversarial attacks typically target these tasks in isolation, overlooking the coupled decision pipeline in real-world scenarios. Consequently, single-task attacks often fail to pose a practical threat. To fill this gap, we first utilize gradient analysis to reveal that ASR and SR exhibit no inherent conflicts. Building on this, we propose Dual-task Universal Adversarial Perturbation (DUAP). Specifically, DUAP employs a targeted surrogate objective to effectively disrupt ASR transcription and introduces a Dynamic Normalized Ensemble (DNE) strategy to enhance transferability across diverse SR models. Furthermore, we incorporate psychoacoustic masking to ensure perturbation imperceptibility. Extensive evaluations across five ASR and six SR models demonstrate that DUAP achieves high simultaneous attack success rates and superior imperceptibility, significantly outperforming existing single-task baselines.

Suyang Sun, Weifei Jin, Yuxin Cao, Wei Song, Jie Hao• 2026

Related benchmarks

TaskDatasetResultRank
Speaker RecognitionSpeaker Recognition Dataset
ECAPA-TDNN Score1
5
ASR AttackWhisper
SRoA-ASR100
5
ASR AttackTencent ASR API
SRoA-ASR84
5
ASR AttackAlibaba ASR API
SRoA-ASR0.842
5
ASR AttackiFlytek ASR API
SRoA-ASR60
5
Audio Imperceptibility EvaluationVCS (test)
SNR-6.96
5
ASR AttackDeepSpeech2
SRoA-ASR100
5
Showing 7 of 7 rows

Other info

Follow for update