Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FILT3R: Latent State Adaptive Kalman Filter for Streaming 3D Reconstruction

About

Streaming 3D reconstruction maintains a persistent latent state that is updated online from incoming frames, enabling constant-memory inference. A key failure mode is the state update rule: aggressive overwrites forget useful history, while conservative updates fail to track new evidence, and both behaviors become unstable beyond the training horizon. To address this challenge, we propose FILT3R, a training-free latent filtering layer that casts recurrent state updates as stochastic state estimation in token space. FILT3R maintains a per-token variance and computes a Kalman-style gain that adaptively balances memory retention against new observations. Process noise -- governing how much the latent state is expected to change between frames -- is estimated online from EMA-normalized temporal drift of candidate tokens. Using extensive experiments, we demonstrate that FILT3R yields an interpretable, plug-in update rule that generalizes common overwrite and gating policies as special cases. Specifically, we show that gains shrink in stable regimes as uncertainty contracts with accumulated evidence, and rise when genuine scene change increases process uncertainty, improving long-horizon stability for depth, pose, and 3D reconstruction, compared to the existing methods. Code will be released at https://github.com/jinotter3/FILT3R.

Seonghyun Jin, Jong Chul Ye• 2026

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationKITTI short sequences
Abs Rel0.11
42
Video Depth EstimationBonn short sequences
Abs Rel0.061
42
Video Depth EstimationSintel (short sequences)
Abs Rel0.407
42
Video Depth EstimationKITTI clips (300 frames) (val)
Absolute Relative Error (Abs Rel)10.8
12
Video Depth EstimationKITTI clips (400 frames) (val)
Abs Rel11.2
6
3D ReconstructionNRGBD length 300 (test)
Accuracy (Mean)7.9
4
3D ReconstructionNRGBD length 400 (test)
Accuracy (Mean)8.6
4
3D Reconstruction7-Scenes length 300
Accuracy (Mean)0.02
4
3D Reconstruction7-Scenes length 500
Accuracy (Mean)2.4
4
3D ReconstructionNRGBD length 500 (test)
Mean Accuracy9.6
4
Showing 10 of 21 rows

Other info

Follow for update