Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GHOST: Geometry-Hierarchical Online Streaming Token Eviction for Efficient 3D Reconstruction

About

Streaming 3D reconstruction from long monocular video sequences requires maintaining a key-value (KV) cache that grows linearly with sequence length, creating a severe memory bottleneck. Existing approaches either truncate the cache to a fixed set of anchor frames, leading to reconstruction quality degradation, or rely on attention-score heuristics that are agnostic to 3D scene structure, failing to preserve geometrically valuable tokens. To address these problems, we present GHOST (Geometry-Hierarchical Online Streaming Token Eviction), a training-free KV cache management framework that exploits the model's own 3D geometry outputs to evict redundant tokens online. GHOST introduces three mutually reinforcing innovations: a hierarchical dual-level importance scoring scheme, a privilege mechanism that protects special tokens from eviction, and a cosine-similarity-guided layer-wise budget allocation. Experiments on various benchmarks show that GHOST preserves excellent reconstruction quality while cutting the KV cache by nearly half and delivering 1.75x faster inference compared to state-of-the-art methods. Our code is available at https://github.com/lokiniuniu/GHOST.

Leyang Chen, Junyi Wu, Zhiteng Li, Yulun Zhang• 2026

Related benchmarks

TaskDatasetResultRank
3D Reconstruction7 Scenes
Accuracy Median0.7
128
3D ReconstructionNRGBD
Accuracy Mean4.6
63
3D ReconstructionBonn (test)
Abs Rel5.4
20
3D ReconstructionLong3D Classroom
Accuracy (Mean)33.2
7
3D ReconstructionLong3D Library
Acc (Mean)0.745
7
3D ReconstructionLong3D Academic Building
Accuracy (Mean)4.325
7
3D ReconstructionLong3D Dormitory
Accuracy (Mean)1.135
4
3D ReconstructionLong3D Badminton Court (6067 frames)
Mean Accuracy1.312
4
Showing 8 of 8 rows

Other info

Follow for update