Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer

About

Reconstructing dynamic 4D scenes remains challenging due to the presence of moving objects that corrupt camera pose estimation. Existing optimization methods alleviate this issue with additional supervision, but they are mostly computationally expensive and impractical in real-time applications. To address these limitations, we propose MoRe, a feedforward 4D reconstruction network that efficiently recovers dynamic 3D scenes from monocular videos. Built upon a strong static reconstruction backbone, MoRe employs an attention-forcing strategy to disentangle dynamic motion from static structure. To further enhance robustness, we fine-tune the model on large-scale, diverse datasets encompassing both dynamic and static scenes. Moreover, our grouped causal attention captures temporal dependencies and adapts to varying token lengths across frames, ensuring temporally coherent geometry reconstruction. Extensive experiments on multiple benchmarks demonstrate that MoRe achieves high-quality dynamic reconstructions with exceptional efficiency.

Juntong Fang, Zequn Chen, Weiqi Zhang, Donglin Di, Xuancheng Zhang, Chengmin Yang, Yu-Shen Liu• 2026

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationSintel
Delta Threshold Accuracy (1.25)64.5
193
Camera pose estimationSintel
ATE0.0877
192
Camera pose estimationTUM-dynamic
ATE0.0115
163
Video Depth EstimationKITTI
Abs Rel0.066
126
Video Depth EstimationBONN
AbsRel5.5
116
Video Depth EstimationTUM dynamics
Abs Rel0.12
53
Pose EstimationBONN
ATE0.0138
38
Camera pose estimationScanNet static indoor scenes
ATE0.0375
25
4D ReconstructionKITTI 11
FPS30.09
7
Showing 9 of 9 rows

Other info

Follow for update