Group-DINOmics: Incorporating People Dynamics into DINO for Self-supervised Group Activity Feature Learning
About
This paper proposes Group Activity Feature (GAF) learning without group activity annotations. Unlike prior work, which uses low-level static local features to learn GAFs, we propose leveraging dynamics-aware and group-aware pretext tasks, along with local and global features provided by DINO, for group-dynamics-aware GAF learning. To adapt DINO and GAF learning to local dynamics and global group features, our pretext tasks use person flow estimation and group-relevant object location estimation, respectively. Person flow estimation is used to represent the local motion of each person, which is an important cue for understanding group activities. In contrast, group-relevant object location estimation encourages GAFs to learn scene context (e.g., spatial relations of people and objects) as global features. Comprehensive experiments on public datasets demonstrate the state-of-the-art performance of our method in group activity retrieval and recognition. Our ablation studies verify the effectiveness of each component in our method. Code: https://github.com/tezuka0001/Group-DINOmics.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Group activity recognition | NBA (test) | MCA73 | 19 | |
| Group activity recognition | VBD (test) | MCA93.9 | 9 | |
| Group activity recognition | Volleyball dataset (VBD) (test) | Merged MCA96.1 | 9 | |
| Group Activity Retrieval | Volleyball dataset (VBD) (test) | Hit@182.7 | 5 | |
| Group Activity Retrieval | NBA dataset (test) | Hit@143.9 | 5 |