Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction

About

Early action prediction deals with inferring the ongoing action from partially-observed videos, typically at the outset of the video. We propose a bottleneck-based attention model that captures the evolution of the action, through progressive sampling over fine-to-coarse scales. Our proposed Temporal Progressive (TemPr) model is composed of multiple attention towers, one for each scale. The predicted action label is based on the collective agreement considering confidences of these towers. Extensive experiments over four video datasets showcase state-of-the-art performance on the task of Early Action Prediction across a range of encoder architectures. We demonstrate the effectiveness and consistency of TemPr through detailed ablations.

Alexandros Stergiou, Dima Damen• 2022

Related benchmarks

TaskDatasetResultRank
Action ClassificationUCF101
Top-1 Accuracy96.6
151
Early Action PredictionSSsub21
Top-1 Accuracy48.6
47
Early Action PredictionNTU60
Top-1 Acc (rho=0.1)29.3
7
Early Action PredictionSS v2--
3
Early Action Prediction (All Action)EK-100
Top-1 Acc (rho=0.1)7.4
2
Early Action Prediction (All Noun)EK-100
Top-1 Accuracy (rho=0.1)22.8
2
Early Action Prediction (All Verb)EK-100
Top-1 Accuracy (rho=0.1)21.4
2
Showing 7 of 7 rows

Other info

Follow for update