Multiplexing Neural Audio Watermarks
About
Audio watermarking is essential for verifying speech authenticity, yet single-watermark schemes often struggle against sophisticated distortions such as neural reconstruction and adversarial attacks. To address this limitation, we introduce a multiplexing paradigm that combines multiple watermarking techniques to leverage their inherent complementarities. We explore both parallel and sequential multiplexing strategies and propose perceptual-adaptive time-frequency multiplexing (PA-TFM), a robust training-free approach. To further enhance performance, we introduce MaskNet, a novel model-based framework designed to learn effective time-domain multiplexing. Experimental results on the LibriSpeech and Common Voice datasets under 14 diverse attack types, including high-strength white-box and neural reconstruction attacks, demonstrate that both PA-TFM and MaskNet considerably outperform existing single-watermark baselines, establishing a resilient paradigm for real-world audio protection.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Audio Watermarking | Audio Robustness Benchmark averaged across 14 attacks | PESQ4.48 | 11 | |
| Audio Watermarking Robustness | LibriSpeech and Common Voice (test) | No Attack Robustness100 | 10 |