Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Contrastive Video Representation Learning via Adversarial Perturbations

About

Adversarial perturbations are noise-like patterns that can subtly change the data, while failing an otherwise accurate classifier. In this paper, we propose to use such perturbations within a novel contrastive learning setup to build negative samples, which are then used to produce improved video representations. To this end, given a well-trained deep model for per-frame video recognition, we first generate adversarial noise adapted to this model. Positive and negative bags are produced using the original data features from the full video sequence and their perturbed counterparts, respectively. Unlike the classic contrastive learning methods, we develop a binary classification problem that learns a set of discriminative hyperplanes -- as a subspace -- that will separate the two bags from each other. This subspace is then used as a descriptor for the video, dubbed \emph{discriminative subspace pooling}. As the perturbed features belong to data classes that are likely to be confused with the original features, the discriminative subspace will characterize parts of the feature space that are more representative of the original data, and thus may provide robust video representations. To learn such descriptors, we formulate a subspace learning objective on the Stiefel manifold and resort to Riemannian optimization methods for solving it efficiently. We provide experiments on several video datasets and demonstrate state-of-the-art results.

Jue Wang, Anoop Cherian• 2018

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D (Cross-View)
Accuracy88.7
609
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy81.6
474
Action RecognitionHMDB-51 (average of three splits)
Top-1 Acc81.5
204
Action RecognitionYUP++ static
Accuracy95.1
8
Action RecognitionYUP++ Moving
Accuracy0.883
4
Showing 5 of 5 rows

Other info

Follow for update