Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ViewBridge: Curriculum Knowledge Distillation for Activity View-Invariance Under Extreme Viewpoint Changes

About

Traditional methods for view-invariant learning rely on controlled multi-view training data with minimal scene clutter. However, they struggle with in-the-wild videos that exhibit extreme viewpoint differences and share little visual content. We introduce ViewBridge, a framework for learning rich video representations in the presence of severe view-occlusions. We introduce a knowledge distillation objective that preserves action-centric semantics, together with a novel curriculum learning procedure that pairs incrementally more challenging views over time, thereby allowing smooth adaptation to extreme viewpoint differences. To sort training video segments for the proposed curriculum, we define a geometry-based metric that reflects their likely occlusion level. While training leverages multi-view data, at inference time, the input is an uncalibrated, single-viewpoint video. Evaluating our approach on two tasks -- temporal keystep grounding and fine-grained keystep recognition -- we outperform SOTA approaches across three datasets (Ego-Exo4D, LEMMA, EPFL-Smart-Kitchen-30). Project page: https://vision.cs.utexas.edu/projects/learning_view_distill/ .

Arjun Somayazulu, Efi Mavroudi, Changan Chen, Lorenzo Torresani, Kristen Grauman• 2025

Related benchmarks

TaskDatasetResultRank
Temporal GroundingEgo-Exo4D E views
Recall@136
10
Temporal GroundingEgo-Exo4D M views
Recall@135
10
Salient Object DetectionDUTLF Focal Stack
MAE0.065
7
Keystep recognitionEgo-Exo4D
Top-1 Accuracy24.07
6
Keystep recognitionEPFL
Top-1 Accuracy19.24
6
Temporal GroundingEgo-Exo4D D views
Recall@128
5
Temporal GroundingEPFL D views
Recall@131
5
Keystep recognitionLEMMA
Top-1 Accuracy27.86
4
Temporal GroundingLEMMA D views
Recall@118
4
Showing 9 of 9 rows

Other info

Follow for update