Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

About

The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production. However, previous methods in V2A have limited generation quality in terms of temporal synchronization and audio-visual relevance. We present Diff-Foley, a synchronized Video-to-Audio synthesis method with a latent diffusion model (LDM) that generates high-quality audio with improved synchronization and audio-visual relevance. We adopt contrastive audio-visual pretraining (CAVP) to learn more temporally and semantically aligned features, then train an LDM with CAVP-aligned visual features on spectrogram latent space. The CAVP-aligned features enable LDM to capture the subtler audio-visual correlation via a cross-attention module. We further significantly improve sample quality with `double guidance'. Diff-Foley achieves state-of-the-art V2A performance on current large scale V2A dataset. Furthermore, we demonstrate Diff-Foley practical applicability and generalization capabilities via downstream finetuning. Project Page: see https://diff-foley.github.io/

Simian Luo, Chuanhao Yan, Chenxu Hu, Hang Zhao• 2023

Related benchmarks

TaskDatasetResultRank
Video-to-Audio GenerationVGGSound (test)
FAD5.62
95
Audio-to-Video RetrievalVGGSound (test)
Recall@111.1
13
Video-to-Audio RetrievalVGGSound (test)
Recall@10.095
11
Video-to-Audio GenerationVGGSound original (test)
Inception Score62.37
8
Foley generationVGGSound (test)
FID15.15
8
Video-to-Audio GenerationVGGSound sparse (test)
Alignment2.15
8
Video-to-Audio GenerationMUSIC (test)
Overall Score1.49
8
Spatial Audio GenerationMixed panoramic video-FOA dataset (YT360) (test)
wCS27
6
Video-to-spatial audio generationHybrid (test)
MOS (Subjective Quality)3.68
6
Spatial Audio GenerationYT360 (test)
FD314.6
5
Showing 10 of 13 rows

Other info

Code

Follow for update