Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

About

Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available at github.com/bbrattoli/ZeroShotVideoClassification.

Biagio Brattoli, Joseph Tighe, Fedor Zhdanov, Pietro Perona, Krzysztof Chalupka• 2020

Related benchmarks

TaskDatasetResultRank
Action RecognitionUCF101 (test)
Accuracy48
307
Action RecognitionHMDB51 (test)
Accuracy0.327
249
Action RecognitionHMDB51
Top-1 Acc32.7
225
Action RecognitionUCF-101
Top-1 Acc48
147
Zero-shot Action RecognitionUCF101 (test)
Accuracy48
33
Action RecognitionHMDB51
Top-1 Acc32.7
30
Zero-shot Action RecognitionHMDB51 (test)
Accuracy32.7
25
Video RecognitionUCF101 v1 (test)
Accuracy37.6
21
Zero-shot LearningUCF101
Accuracy46.2
20
Zero-shot LearningOlympics
Accuracy61.4
20
Showing 10 of 29 rows

Other info

Code

Follow for update