Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Self-Improving 4D Perception via Self-Distillation

About

Large-scale multi-view reconstruction models have made remarkable progress, but most existing approaches still rely on fully supervised training with ground-truth 3D/4D annotations. Such annotations are expensive and particularly scarce for dynamic scenes, limiting scalability. We propose SelfEvo, a self-improving framework that continually improves pretrained multi-view reconstruction models using unlabeled videos. SelfEvo introduces a self-distillation scheme using spatiotemporal context asymmetry, enabling self-improvement for learning-based 4D perception without external annotations. We systematically study design choices that make self-improvement effective, including loss signals, forms of asymmetry, and other training strategies. Across eight benchmarks spanning diverse datasets and domains, SelfEvo consistently improves pretrained baselines and generalizes across base models (e.g. VGGT and $\pi^3$), with significant gains on dynamic scenes. Overall, SelfEvo achieves up to 36.5% relative improvement in video depth estimation and 20.1% in camera estimation, without using any labeled data. Project Page: https://self-evo.github.io/.

Nan Huang, Pengcheng Yu, Weijia Zeng, James M. Rehg, Angjoo Kanazawa, Haiwen Feng, Qianqian Wang• 2026

Related benchmarks

TaskDatasetResultRank
Video Depth EstimationSintel (test)
Delta 1 Accuracy70.7
61
Video Depth EstimationBonn (test)
Abs Rel0.044
41
Camera pose estimationRealEstate10K
AUC@3083.359
26
Video Depth EstimationKITTI (test)--
25
Video DepthDROID
Abs Rel0.223
8
Video DepthBEDLAM 2.0
Abs Rel Error0.027
8
Camera EstimationBEDLAM 2.0
AUC@586.48
4
Monocular Depth EstimationDROID (unseen domain)
Abs Rel0.237
4
Monocular Depth EstimationHOI4D (unseen domain)
Abs Rel Error0.03
4
Camera EstimationOmniGeo
AUC@558.271
2
Showing 10 of 13 rows

Other info

Follow for update