Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Bootstrapping Video Semantic Segmentation Model via Distillation-assisted Test-Time Adaptation

About

Fully supervised Video Semantic Segmentation (VSS) relies heavily on densely annotated video data, limiting practical applicability. Alternatively, applying pre-trained Image Semantic Segmentation (ISS) models frame-by-frame avoids annotation costs but ignores crucial temporal coherence. Recent foundation models such as SAM2 enable high-quality mask propagation yet remain impractical for direct VSS due to limited semantic understanding and computational overhead. In this paper, we propose DiTTA (Distillation-assisted Test-Time Adaptation), a novel framework that converts an ISS model into a temporally-aware VSS model through efficient test-time adaptation (TTA), without annotated videos. DiTTA distills SAM2's temporal segmentation knowledge into the ISS model during a brief, single-pass initialization phase, complemented by a lightweight temporal fusion module to aggregate cross-frame context. Crucially, DiTTA achieves robust generalization even when adapting with highly limited partial video snippets (e.g., initial 10%), significantly outperforming zero-shot refinement approaches that repeatedly invoke SAM2 during inference. Extensive experiments on VSPW and Cityscapes demonstrate DiTTA's effectiveness, achieving competitive or superior performance relative to fully-supervised VSS methods, thus providing a practical and annotation-free solution for real-world VSS tasks.

Jihun Kim, Hoyong Kwon, Hyeokjun Kweon, Kuk-Jin Yoon• 2026

Related benchmarks

TaskDatasetResultRank
Video Semantic SegmentationVSPW (val)
mIoU53.2
121
Video Semantic SegmentationCityscapes (val)
mIoU46.9
103
Video Semantic SegmentationVSPW W2F protocol (10% warm-up ratio)
mIoU51.1
9
Video Semantic SegmentationVSPW W2F protocol 25% warm-up ratio
mIoU51
9
Video Semantic SegmentationVSPW 50% warm-up ratio W2F protocol
mIoU52.3
9
Showing 5 of 5 rows

Other info

Follow for update