Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning Representational Invariances for Data-Efficient Action Recognition

About

Data augmentation is a ubiquitous technique for improving image classification when labeled data is scarce. Constraining the model predictions to be invariant to diverse data augmentations effectively injects the desired representational invariances to the model (e.g., invariance to photometric variations) and helps improve accuracy. Compared to image data, the appearance variations in videos are far more complex due to the additional temporal dimension. Yet, data augmentation methods for videos remain under-explored. This paper investigates various data augmentation strategies that capture different video invariances, including photometric, geometric, temporal, and actor/scene augmentations. When integrated with existing semi-supervised learning frameworks, we show that our data augmentation strategy leads to promising performance on the Kinetics-100/400, Mini-Something-v2, UCF-101, and HMDB-51 datasets in the low-label regime. We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.

Yuliang Zou, Jinwoo Choi, Qitong Wang, Jia-Bin Huang• 2021

Related benchmarks

TaskDatasetResultRank
Action RecognitionUCF-101 (81/20)
Accuracy57.4
13
Action RecognitionUCF101 (50% labels)
Accuracy64.7
13
Action RecognitionUCF101 (10% labels)
Accuracy53
13
Action RecognitionUCF101 5% labels
Accuracy45.1
13
Action RecognitionHMDB51 (40% labels)
Accuracy35.7
13
Action RecognitionHMDB51 (60% labels)
Accuracy40.8
13
Action RecognitionHMDB51 (50% labels)
Accuracy0.395
13
Action RecognitionKinetics 100 (50% labels)
Accuracy72.2
12
Action RecognitionKinetics 100 (20% labels)
Accuracy68.7
12
Action RecognitionKinetics 100 (10% labels)
Accuracy63.9
12
Showing 10 of 13 rows

Other info

Follow for update