Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction

About

Recent feed-forward reconstruction models like VGGT and $\pi^3$ achieve impressive reconstruction quality but cannot process streaming videos due to quadratic memory complexity, limiting their practical deployment. While existing streaming methods address this through learned memory mechanisms or causal attention, they require extensive retraining and may not fully leverage the strong geometric priors of state-of-the-art offline models. We propose LASER, a training-free framework that converts an offline reconstruction model into a streaming system by aligning predictions across consecutive temporal windows. We observe that simple similarity transformation ($\mathrm{Sim}(3)$) alignment fails due to layer depth misalignment: monocular scale ambiguity causes relative depth scales of different scene layers to vary inconsistently between windows. To address this, we introduce layer-wise scale alignment, which segments depth predictions into discrete layers, computes per-layer scale factors, and propagates them across both adjacent windows and timestamps. Extensive experiments show that LASER achieves state-of-the-art performance on camera pose estimation and point map reconstruction %quality with offline models while operating at 14 FPS with 6 GB peak memory on a RTX A6000 GPU, enabling practical deployment for kilometer-scale streaming videos. Project website: $\href{https://neu-vi.github.io/LASER/}{\texttt{https://neu-vi.github.io/LASER/}}$

Tianye Ding, Yiming Xie, Yiqing Liang, Moitreya Chatterjee, Pedro Miraldo, Huaizu Jiang• 2025

Related benchmarks

TaskDatasetResultRank
Camera pose estimationSintel
ATE0.061
92
Camera pose estimationScanNet
ATE RMSE (Avg.)0.031
61
Video Depth EstimationSintel (test)
Delta 1 Accuracy68.8
57
Video Depth EstimationBonn (test)
Abs Rel0.048
37
Video Depth EstimationKITTI (test)
Delta198.3
25
Point Map Estimation7 Scenes
Accuracy (Mean)2.1
19
Camera pose estimationTUM
ATE0.013
13
Camera pose estimationKITTI
ATE (03)2.64
12
Multi-view Point Map EstimationNRGBD
Accuracy (Mean)0.02
7
Long-term Point Map EstimationWaymo Open Dataset Outdoor Long-term
Avg Accuracy0.56
3
Showing 10 of 10 rows

Other info

Follow for update