Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Deep Audio Watermarks are Shallow: Limitations of Post-Hoc Watermarking Techniques for Speech

About

In the audio modality, state-of-the-art watermarking methods leverage deep neural networks to allow the embedding of human-imperceptible signatures in generated audio. The ideal is to embed signatures that can be detected with high accuracy when the watermarked audio is altered via compression, filtering, or other transformations. Existing audio watermarking techniques operate in a post-hoc manner, manipulating "low-level" features of audio recordings after generation (e.g. through the addition of a low-magnitude watermark signal). We show that this post-hoc formulation makes existing audio watermarks vulnerable to transformation-based removal attacks. Focusing on speech audio, we (1) unify and extend existing evaluations of the effect of audio transformations on watermark detectability, and (2) demonstrate that state-of-the-art post-hoc audio watermarks can be removed with no knowledge of the watermarking scheme and minimal degradation in audio quality.

Patrick O'Reilly, Zeyu Jin, Jiaqi Su, Bryan Pardo• 2025

Related benchmarks

TaskDatasetResultRank
Audio Quality AssessmentClotho 1.0 (test)
ViSQOL4.347
10
Watermark DetectionClotho 1.0 (test)
Perth100
10
Audio Watermark RemovalFMA small
ViSQOL Score4.109
10
Watermark RemovalLibriSpeech speech domain official releases (test)
SQUIM-MOS4.072
10
Watermark RemovalLibriSpeech SilentCipher (dev)
STOI94.4
4
Watermark RemovalLibriSpeech WavMark (dev)
STOI97.1
4
Watermark RemovalFMA WavMark
STOI96
4
Watermark RemovalVCTK AudioMarkNet
STOI94.7
4
Watermark RemovalLibriSpeech AudioSeal (dev)
STOI0.944
4
Watermark RemovalFMA SilentCipher
STOI0.932
4
Showing 10 of 11 rows

Other info

Follow for update