4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere

About

We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos. Unlike existing approaches that typically decouple motion from geometry or produce limited 4D attributes such as sparse trajectories or two-view scene flow, 4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics. At its core, 4RC introduces a novel encode-once, query-anywhere and anytime paradigm: a transformer backbone encodes the entire video into a compact spatio-temporal latent space, from which a conditional decoder can efficiently query 3D geometry and motion for any query frame at any target timestamp. To facilitate learning, we represent per-view 4D attributes in a minimally factorized form by decomposing them into base geometry and time-dependent relative motion. Extensive experiments demonstrate that 4RC outperforms prior and concurrent methods across a wide range of 4D reconstruction tasks.

Yihang Luo, Shangchen Zhou, Yushi Lan, Xingang Pan, Chen Change Loy• 2026

Related benchmarks

Task	Dataset	Result
Video Depth Estimation	Sintel	Delta Threshold Accuracy (1.25)67	235
Camera pose estimation	ScanNet	RPE (t)0.012	133
Video Depth Estimation	BONN	Relative Error (Rel)0.048	108
Camera pose estimation	TUM dynamics	ATE0.01	90
3D Scene Reconstruction	7-Scenes (test)	Accuracy0.034	34
Sparse Point Tracking	Panoptic Studio (PStudio) TAPVid-3D	APD87.32	14
3D Reconstruction	NRGBD (test)	Acc3.6	12
Dense Tracking	Kubric	EPE1.525	11
Sparse Point Tracking	Dynamic Replica (DR) (test)	APD88.65	11
Sparse Point Tracking	PointOdyssey (PO) (test)	APD85.86	11

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord