Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Stabilizing Streaming Video Geometry via Dynamic Feature Normalization

About

Consistent 3D geometry estimation from streaming RGB input is crucial for real-world applications such as autonomous driving, embodied AI, and large-scale reconstruction. While modern monocular geometry foundation models achieve strong single-image accuracy, they exhibit severe temporal inconsistency on continuous input, notably dominated by scale--shift drifting. Through targeted empirical analysis, we trace this instability to its root cause: fluctuations in latent feature statistics, whose mean and variance directly determine the predicted depth's scale and shift. Building on this insight, we introduce Dynamic Feature Normalization (DyFN), a lightweight, causal recurrent module that dynamically and robustly modulates feature statistics to maintain stable geometry over time. We adapt powerful pretrained monocular geometry models for streaming by finetuning only DyFN, a mere 2\% additional parameters, while keeping the backbone frozen, thereby achieving temporal consistency without compromising single-image accuracy. Extensive experiments across four benchmarks show that DyFN effectively eliminates temporal artifacts such as disjointed layering and positional jitter, and achieves state-of-the-art temporal stability, improving over prior streaming methods by up to 14\% and even outperforming heavier non-causal video baselines. Project Page: https://shawlyu.github.io/DyFN

Xiaoyang Lyu, Muxin Liu, Xiaoshan Wu, Ruicheng Wang, Yi-Hua Huang, Yang-Tian Sun, Shaoshuai Shi, Xiaojuan Qi• 2026

Related benchmarks

TaskDatasetResultRank
Depth EstimationSintel ~50 frames
AbsRel0.18
70
Depth EstimationKITTI 110 frames
AbsRel6.2
69
Video Depth EstimationBonn 110 frames
AbsRel4.4
63
Video Depth EstimationScannet 90 frames
AbsRel0.073
22
Showing 4 of 4 rows

Other info

Follow for update