Depth Completion as Parameter-Efficient Test-Time Adaptation

About

We introduce CAPA, a parameter-efficient test-time optimization framework that adapts pre-trained 3D foundation models (FMs) for depth completion, using sparse geometric cues. Unlike prior methods that train task-specific encoders for auxiliary inputs, which often overfit and generalize poorly, CAPA freezes the FM backbone. Instead, it updates only a minimal set of parameters using Parameter-Efficient Fine-Tuning (e.g. LoRA or VPT), guided by gradients calculated directly from the sparse observations available at inference time. This approach effectively grounds the foundation model's geometric prior in the scene-specific measurements, correcting distortions and misplaced structures. For videos, CAPA introduces sequence-level parameter sharing, jointly adapting all frames to exploit temporal correlations, improve robustness, and enforce multi-frame consistency. CAPA is model-agnostic, compatible with any ViT-based FM, and achieves state-of-the-art results across diverse condition patterns on both indoor and outdoor datasets. Project page: research.nvidia.com/labs/dvl/projects/capa.

Bingxin Ke, Qunjie Zhou, Jiahui Huang, Xuanchi Ren, Tianchang Shen, Konrad Schindler, Laura Leal-Taix\'e, Shengyu Huang• 2026

Related benchmarks

Task	Dataset	Result
Depth Estimation	ScanNet	AbsRel0.9	121
2D Depth Estimation	7 Scenes	Abs Rel0.9	28
Depth Completion	ScanNet	--	22
Depth Estimation	iBims	Abs Rel Error1.3	21
Depth Completion	ScanNet SIFT (test)	RMSE (%)0.053	16
Depth Completion	ScanNet 100 pts	RMSE (%)0.053	16
Depth Completion	ScanNet < 3m	RMSE8.9	16
Depth Completion	7-Scenes SfM	RMSE (%)11.1	16
Depth Completion	7-Scenes 100 pts	RMSE (%)6.1	16
Depth Completion	7-Scenes < 3m	RMSE (%)6.6	16

Showing 10 of 36 rows

Other info

Follow for update

@wizwand_team Discord