Pose And Joint-Aware Action Recognition
About
Recent progress on action recognition has mainly focused on RGB and optical flow features. In this paper, we approach the problem of joint-based action recognition. Unlike other modalities, constellation of joints and their motion generate models with succinct human motion information for activity recognition. We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder before performing collective reasoning. Our joint selector module re-weights the joint information to select the most discriminative joints for the task. We also propose a novel joint-contrastive loss that pulls together groups of joint features which convey the same action. We strengthen the joint-based representations by using a geometry-aware data augmentation technique which jitters pose heatmaps while retaining the dynamics of the action. We show large improvements over the current state-of-the-art joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition datasets. A late fusion with RGB and Flow-based approaches yields additional improvements. Our model also outperforms the existing baseline on Mimetics, a dataset with out-of-context actions.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Action Recognition | HMDB51 (split 1) | -- | 75 | |
| Action Recognition | Charades (val) | mAP43.23 | 69 | |
| Action Recognition | JHMDB Mean over 3 splits | Accuracy68.55 | 18 | |
| Action Recognition | HMDB51 (avg 3 splits) | Accuracy76.34 | 15 | |
| Spatio-temporal Action Localization | AVA v2.1 (val) | mAP28.4 | 13 | |
| Action Recognition | JHMDB | Mean Per-Class Accuracy88.36 | 11 | |
| Action Recognition | HMDB | Mean Per-Class Accuracy84.53 | 10 | |
| Action Recognition | HMDB (3-split average) | Mean Per-Class Accuracy54.2 | 6 | |
| Action Recognition | Mimetics 50 Kinetics classes (test) | Top-1 Acc26.6 | 2 | |
| Action Recognition | J-HMDB (Split 1) | -- | 1 |