Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

About

Audio-driven avatar interaction demands real-time, streaming, and infinite-length generation -- capabilities fundamentally at odds with the sequential denoising and long-horizon drift of current diffusion models. We present Live Avatar, an algorithm-system co-designed framework that addresses both challenges for a 14-billion-parameter diffusion model. On the algorithm side, a two-stage pipeline distills a pretrained bidirectional model into a causal, few-step streaming one, while a set of complementary long-horizon strategies eliminate identity drift and visual artifacts, enabling stable autoregressive generation exceeding 10000 seconds. On the system side, Timestep-forcing Pipeline Parallelism (TPP) assigns each GPU a fixed denoising timestep, converting the sequential diffusion chain into an asynchronous spatial pipeline that simultaneously boosts throughput and improves temporal consistency. Live Avatar achieves 45 FPS with a TTFF of 1.21\,s on 5 H800 GPUs, and to our knowledge is the first to enable practical real-time streaming of a 14B diffusion model for infinite-length avatar generation. We further introduce GenBench, a standardized long-form benchmark, to facilitate reproducible evaluation. Our project page is at https://liveavatar.github.io/.

Yubo Huang, Hailong Guo, Fangtai Wu, Weiqiang Wang, Shifeng Zhang, Shijie Huang, Qijun Gan, Lin Liu, Sirui Zhao, Enhong Chen, Jiaming Liu, Steven Hoi• 2025

Related benchmarks

Task	Dataset	Result
Talking Head Generation	HDTF	FID15.85	48
Audio-visual generation	SocialVideo-Bench 480P-20s	Parameters (B)14	12
Talking Avatar Generation	CelebV-HQ (clips)	FID37.63	10
Talking Avatar Generation	long-form videos (test)	FID57.81	10
Talking Head Generation	EMTD	Sync-C7.204	10
Audio-driven Avatar Generation	HDTF	CSIM0.8127	9
Audio-driven video generation	Custom evaluation dataset	Sync-C3.89	9
Interactive Avatar Generation	500 videos (test)	IQA3.94	8
Audio-driven Avatar Generation	GenBench ShortVideo (user study)	Naturalness86.3	7
Audio-driven Avatar Generation	GenBench-ShortVideo (test)	ASE3.44	7

Showing 10 of 18 rows

Other info

GitHub

Follow for update

@wizwand_team Discord