Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Diffusion Timbre Transfer Via Mutual Information Guided Inpainting

About

We study timbre transfer as an inference-time editing problem for music audio. Starting from a strong pre-trained latent diffusion model, we introduce a lightweight procedure that requires no additional training: (i) a dimension-wise noise injection that targets latent channels most informative of instrument identity, and (ii) an early-step clamping mechanism that re-imposes the input's melodic and rhythmic structure during reverse diffusion. The method operates directly on audio latents and is compatible with text/audio conditioning (e.g., CLAP). We discuss design choices,analyze trade-offs between timbral change and structural preservation, and show that simple inference-time controls can meaningfully steer pre-trained models for style-transfer use cases.

Ching Ho Lee, Javier Nistal, Stefan Lattner, Marco Pasini, George Fazekas• 2026

Related benchmarks

TaskDatasetResultRank
Timbre TransferMUSHRA-style 60 excerpts (subjective evaluation)
Delta (beta)-0.395
3
Timbre TransferSubjective Evaluation Set 60 excerpts (test)
MOS3.52
3
Showing 2 of 2 rows

Other info

Follow for update