4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere
About
We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos. Unlike existing approaches that typically decouple motion from geometry or produce limited 4D attributes such as sparse trajectories or two-view scene flow, 4RC learns a holistic 4D representation that jointly captures dense scene geometry and motion dynamics. At its core, 4RC introduces a novel encode-once, query-anywhere and anytime paradigm: a transformer backbone encodes the entire video into a compact spatio-temporal latent space, from which a conditional decoder can efficiently query 3D geometry and motion for any query frame at any target timestamp. To facilitate learning, we represent per-view 4D attributes in a minimally factorized form by decomposing them into base geometry and time-dependent relative motion. Extensive experiments demonstrate that 4RC outperforms prior and concurrent methods across a wide range of 4D reconstruction tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Depth Estimation | Sintel | Relative Error (Rel)0.249 | 109 | |
| Video Depth Estimation | BONN | Relative Error (Rel)0.048 | 103 | |
| Camera pose estimation | ScanNet | ATE RMSE (Avg.)0.032 | 61 | |
| Camera pose estimation | TUM dynamics | RRE0.314 | 57 | |
| 3D Scene Reconstruction | 7-Scenes (test) | Accuracy0.034 | 27 | |
| Sparse Point Tracking | Panoptic Studio (PStudio) TAPVid-3D | APD87.32 | 14 | |
| 3D Reconstruction | NRGBD (test) | Acc3.6 | 12 | |
| Dense Tracking | Kubric | EPE1.525 | 11 | |
| Sparse Point Tracking | Dynamic Replica (DR) (test) | APD88.65 | 11 | |
| Sparse Point Tracking | PointOdyssey (PO) (test) | APD85.86 | 11 |