Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

About

Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion with pre-trained diffusion models. The first, which we coin ZEro-shot Text-based Audio (ZETA) editing, is adopted from the image domain. The second, named ZEro-shot UnSupervized (ZEUS) editing, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found in https://hilamanor.github.io/AudioEditing/ .

Hila Manor, Tomer Michaeli• 2024

Related benchmarks

TaskDatasetResultRank
Audio EditingAudioCaps
FD (Frechet Distance)57.27
24
Timbre TransferMUSDB18 HQ (test)
CLAP0.283
8
Timbre TransferMusicDelta
CLAP0.351
8
Audio EditAudio Edit (test)
Feature Distance (FD)3.81
6
Music EditingMusic Editing Subjective (evaluation)
Target Attribute Match (T)3.16
6
Music EditingZoME-Bench Instrument
CLAP26.1
6
Music EditingZoME-Bench Genre
CLAP27.3
6
Audio Target Object RemovalSAVEBench 1.0 (test)
FAD2.69
4
Audio Target Object RemovalSAVEBENCH
FAD2.69
4
Audio EditingAudioEdit
Overlap Score (OVL)74.6
3
Showing 10 of 10 rows

Other info

Follow for update