Wan-S2V: Audio-Driven Cinematic Video Generation

About

Current state-of-the-art (SOTA) methods for audio-driven character animation demonstrate promising performance for scenarios primarily involving speech and singing. However, they often fall short in more complex film and television productions, which demand sophisticated elements such as nuanced character interactions, realistic body movements, and dynamic camera work. To address this long-standing challenge of achieving film-level character animation, we propose an audio-driven model, which we refere to as Wan-S2V, built upon Wan. Our model achieves significantly enhanced expressiveness and fidelity in cinematic contexts compared to existing approaches. We conducted extensive experiments, benchmarking our method against cutting-edge models such as Hunyuan-Avatar and Omnihuman. The experimental results consistently demonstrate that our approach significantly outperforms these existing solutions. Additionally, we explore the versatility of our method through its applications in long-form video generation and precise video lip-sync editing.

Xin Gao, Li Hu, Siqi Hu, Mingyang Huang, Chaonan Ji, Dechao Meng, Jinwei Qi, Penchong Qiao, Zhen Shen, Yafei Song, Ke Sun, Linrui Tian, Guangyuan Wang, Qi Wang, Zhongjian Wang, Jiayu Xiao, Sheng Xu, Bang Zhang, Peng Zhang, Xindi Zhang, Zhe Zhang, Jingren Zhou, Lian Zhuo• 2025

Related benchmarks

Task	Dataset	Result
Talking Head Generation	HDTF (test)	FID23.85	73
Multimodal Customization	OC-Bench (test)	Face-Sim0.774	12
Talking Avatar Generation	CelebV-HQ (clips)	FID38.21	10
Talking Avatar Generation	long-form videos (test)	FID88.73	10
Talking head video generation	Action Bench (test)	Sync-C6.473	9
Audio-driven video generation	Custom evaluation dataset	Sync-C4.05	9
Audio-driven Avatar Generation	GenBench ShortVideo (user study)	Naturalness84.3	7
Audio-Video Joint Generation	Audio-Video Generation (test)	LSE-C5.2	7
Audio-driven Avatar Generation	GenBench-ShortVideo (test)	ASE3.36	7
Talking Head Generation	Foundation capability evaluation set	IQA4.49	7

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord