Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Actor-Transformers for Group Activity Recognition

About

This paper strives to recognize individual actions and group activities from videos. While existing solutions for this challenging problem explicitly model spatial and temporal relationships based on location of individual actors, we propose an actor-transformer model able to learn and selectively extract information relevant for group activity recognition. We feed the transformer with rich actor-specific static and dynamic representations expressed by features from a 2D pose network and 3D CNN, respectively. We empirically study different ways to combine these representations and show their complementary benefits. Experiments show what is important to transform and how it should be transformed. What is more, actor-transformers achieve state-of-the-art results on two publicly available benchmarks for group activity recognition, outperforming the previous best published results by a considerable margin.

Kirill Gavrilyuk, Ryan Sanford, Mehrsan Javan, Cees G. M. Snoek• 2020

Related benchmarks

TaskDatasetResultRank
Group activity recognitionVolleyball Dataset (VD) (original)
Accuracy94.4
79
Group activity recognitionVolleyball dataset
Accuracy94.4
40
Group activity recognitionCollective Activity (test)
Accuracy92.8
37
Group activity recognitionVolleyball dataset (test)
MCA90
37
Group activity recognitionCollective Activity Dataset
Accuracy92.8
25
Individual Activity RecognitionVolleyball (test)
Accuracy85.9
19
Group activity recognitionVolleyball dataset
MCA90
19
Individual Action RecognitionVolleyball dataset
Accuracy83.7
18
Group activity recognitionVolleyball Dataset (VD) (Olympic)
Accuracy76.9
10
Group activity recognitionNBA (test)
MCA0.471
10
Showing 10 of 10 rows

Other info

Follow for update