Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Few-Shot Video Classification via Temporal Alignment

About

There is a growing interest in learning a model which could recognize novel classes with only a few labeled examples. In this paper, we propose Temporal Alignment Module (TAM), a novel few-shot learning framework that can learn to classify a previous unseen video. While most previous works neglect long-term temporal ordering information, our proposed model explicitly leverages the temporal ordering information in video data through temporal alignment. This leads to strong data-efficiency for few-shot learning. In concrete, TAM calculates the distance value of query video with respect to novel class proxies by averaging the per frame distances along its alignment path. We introduce continuous relaxation to TAM so the model can be learned in an end-to-end fashion to directly optimize the few-shot learning objective. We evaluate TAM on two challenging real-world datasets, Kinetics and Something-Something-V2, and show that our model leads to significant improvement of few-shot video classification over a wide range of competitive baselines.

Kaidi Cao, Jingwei Ji, Zhangjie Cao, Chien-Yi Chang, Juan Carlos Niebles• 2019

Related benchmarks

TaskDatasetResultRank
Action RecognitionSomething-Something v2
Top-1 Accuracy52.3
341
Action RecognitionKinetics
Accuracy (5-shot)85.8
47
Few-shot Action RecognitionKinetics (meta-test)
Accuracy85.8
46
Video RecognitionKinetics (test)
Accuracy85.8
42
Action RecognitionSSv2 Few-shot
Top-1 Acc (5-way 1-shot)50.2
42
Few-shot Action RecognitionSS Full meta v2 (test)
Accuracy52.3
38
Audio-Visual Event LocalizationAVE--
35
Activity ClassificationMOMA Activities (meta-test)
Accuracy92.07
34
Action RecognitionSSv2 Small
Top-1 Acc (1-shot)38.9
26
Few-shot Video ClassificationSomething-Something V2 (Small)
Accuracy48
24
Showing 10 of 42 rows

Other info

Follow for update