R3-Avatar: Record and Retrieve Temporal Codebook for Reconstructing Photorealistic Human Avatars

About

We present R3-Avatar, incorporating a temporal codebook, to overcome the inability of human avatars to be both animatable and of high-fidelity rendering quality. Existing video-based reconstruction of 3D human avatars either focuses solely on rendering, lacking animation support, or learns a pose-appearance mapping for animating, which degrades under limited training poses or complex clothing. In this paper, we adopt a "record-retrieve-reconstruct" strategy that ensures high-quality rendering from novel views while mitigating degradation in novel poses. Specifically, disambiguating timestamps record temporal appearance variations in a codebook, ensuring high-fidelity novel-view rendering, while novel poses retrieve corresponding timestamps by matching the most similar training poses for augmented appearance. Our R3-Avatar outperforms cutting-edge video-based human avatar reconstruction, particularly in overcoming visual quality degradation in extreme scenarios with limited training human poses and complex clothing.

Yifan Zhan, Wangze Xu, Qingtian Zhu, Muyao Niu, Mingze Ma, Yifei Liu, Zhihang Zhong, Xiao Sun, Yinqiang Zheng• 2025

Related benchmarks

Task	Dataset	Result	Rank
Self-Reenactment	Captured sequences novel poses (test)	FID119		4
Novel View Synthesis	Captured multi-view human sequences train poses (held-out cameras)	PSNR26.33		4

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord