Token-Based Audio Inpainting via Discrete Diffusion

About

Audio inpainting seeks to restore missing segments in degraded recordings. Previous diffusion-based methods exhibit impaired performance when the missing region is large. We introduce the first approach that applies discrete diffusion over tokenized music representations from a pre-trained audio tokenizer, enabling stable and semantically coherent restoration of long gaps. Our method further incorporates two training approaches: a derivative-based regularization loss that enforces smooth temporal dynamics, and a span-based absorbing transition that provides structured corruption during diffusion. Experiments on the MusicNet and MAESTRO datasets with gaps up to 750 ms show that our approach consistently outperforms strong baselines across range of gap lengths, for gaps of 150 ms and above. This work advances musical audio restoration and introduces new directions for discrete diffusion model training. Visit our project page for examples and code.

Tali Dror, Iftach Shoham, Moshe Buchris, Oren Gal, Haim Permuter, Gilad Katz, Eliya Nachmani• 2025

Related benchmarks

Task	Dataset	Result
Audio Inpainting	MusicNet (test)	FAD1.866	20
Audio Inpainting	MAESTRO	ODG (PEA-Q)-2.596	8
Audio Inpainting	MAESTRO (test)	MOS3.64	4

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord