Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

About

Unsupervised domain adaptation which aims to adapt models trained on a labeled source domain to a completely unlabeled target domain has attracted much attention in recent years. While many domain adaptation techniques have been proposed for images, the problem of unsupervised domain adaptation in videos remains largely underexplored. In this paper, we introduce Contrast and Mix (CoMix), a new contrastive learning framework that aims to learn discriminative invariant feature representations for unsupervised video domain adaptation. First, unlike existing methods that rely on adversarial learning for feature alignment, we utilize temporal contrastive learning to bridge the domain gap by maximizing the similarity between encoded representations of an unlabeled video at two different speeds as well as minimizing the similarity between different videos played at different speeds. Second, we propose a novel extension to the temporal contrastive loss by using background mixing that allows additional positives per anchor, thus adapting contrastive learning to leverage action semantics shared across both domains. Moreover, we also integrate a supervised contrastive learning objective using target pseudo-labels to enhance discriminability of the latent space for video domain adaptation. Extensive experiments on several benchmark datasets demonstrate the superiority of our proposed approach over state-of-the-art methods. Project page: https://cvir.github.io/projects/comix

Aadarsh Sahoo, Rutav Shah, Rameswar Panda, Kate Saenko, Abir Das• 2021

Related benchmarks

TaskDatasetResultRank
Action RecognitionUCF-HMDB
Accuracy (UCF -> HMDB)86.7
46
Action RecognitionEPIC-KITCHENS (test)
Average Score43.2
25
Video Domain AdaptationUCF → HMDB (target)
Accuracy93.1
10
Video Domain AdaptationHMDB → UCF (target)
Accuracy96.6
10
Video Domain AdaptationJester(S) → Jester(T) (target)
Accuracy69.6
10
Action RecognitionJester Js -> Jt
Top-1 Acc64.7
7
Action RecognitionEpic-Kitchens D1 -> D2
Top-1 Acc42.9
7
Action RecognitionEpic-Kitchens D1 -> D3
Top-1 Acc40.9
7
Action RecognitionEpic-Kitchens D2 -> D3
Top-1 Accuracy45.2
7
Action RecognitionEpic-Kitchens D3 -> D1
Top-1 Accuracy42.3
7
Showing 10 of 12 rows

Other info

Code

Follow for update