Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

XAttnMark: Learning Robust Audio Watermarking with Cross-Attention

About

The rapid proliferation of generative audio synthesis and editing technologies has raised serious concerns about copyright infringement, data provenance, and the spread of misinformation via deepfake audio. Watermarking offers a proactive solution by embedding imperceptible yet identifiable and traceable signals into audio content. While recent neural network-based watermarking methods like WavMark and AudioSeal have improved robustness and quality, they struggle to jointly optimize both robust detection and accurate attribution. This paper introduces Cross-Attention Robust Audio Watermark (XATTNMARK), which bridges this gap by leveraging partial parameter sharing between the generator and the detector, a cross-attention mechanism for efficient message retrieval, and a temporal conditioning module for improved message distribution. Additionally, we propose a psychoacoustic-aligned time-frequency (TF) masking loss that captures fine-grained auditory masking effects, improving watermark imperceptibility. XATTNMARK achieves state-of-the-art performance in both detection and attribution, demonstrating superior robustness against a wide range of audio transformations, including challenging generative editing at varying strengths. This work advances audio watermarking for protecting intellectual property and ensuring authenticity in the era of generative AI.

Yixin Liu, Lie Lu, Jihui Jin, Lichao Sun, Andrea Fanelli• 2025

Related benchmarks

TaskDatasetResultRank
Audio Watermarking AttributionMusicCaps
Accuracy (Att.) (%)100
352
Audio Watermark AttributionMusicCaps (test)
Attribution Accuracy100
85
Audio Watermark DetectionMusicCaps balanced (val)
Accuracy99.5
85
Audio Watermark DetectionMusicCaps (test)
Detection Accuracy99.5
85
Audio Watermark DetectionStable Audio generative edits (test)
Accuracy94
33
Audio Watermark DetectionAudioLDM2-Music generative edits (test)
Accuracy93.75
18
Audio Watermark DetectionAudioLDM2 generative edits (test)
Accuracy94
15
Audio Watermarking AttributionVoxPopuli
FAR (%)5.03
12
Watermark DetectionAudioMarkBench
Accuracy68
10
Audio Perceptual Quality Assessmentaudio evaluation dataset (test)
SI-SNR29
7
Showing 10 of 27 rows

Other info

Follow for update