LSTA: Long Short-Term Attention for Egocentric Action Recognition
About
Egocentric activity recognition is one of the most challenging tasks in video analysis. It requires a fine-grained discrimination of small objects and their manipulation. While some methods base on strong supervision and attention mechanisms, they are either annotation consuming or do not take spatio-temporal patterns into account. In this paper we propose LSTA as a mechanism to focus on features from spatial relevant parts while attention is being tracked smoothly across the video sequence. We demonstrate the effectiveness of LSTA on egocentric activity recognition with an end-to-end trainable two-stream architecture, achieving state of the art performance on four standard benchmarks.
Swathikiran Sudhakaran, Sergio Escalera, Oswald Lanz• 2018
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Action Recognition | EPIC-Kitchens v1 (test s2 (unseen)) | Actions Top-1 Acc16.63 | 32 | |
| Action Recognition | EPIC-Kitchens s1 (seen) v1 (test) | Actions Top-1 Accuracy30.2 | 29 | |
| Action Recognition | EGTEA Gaze+ | Accuracy61.86 | 18 | |
| Action Recognition | EPIC-KITCHENS 1 (S1 Seen kitchens) | Top-1 Accuracy (Verb)59.55 | 17 | |
| Egocentric Action Recognition | EPIC-Kitchens test (S1) | Top-1 Acc (Verb)59.55 | 16 | |
| Egocentric Action Recognition | EPIC-KITCHENS S2 (test) | Top-1 Accuracy (Verb)47.32 | 16 | |
| Egocentric Activity Recognition | GTEA 61 | Accuracy80.01 | 14 | |
| Egocentric Activity Recognition | GTEA 61 (fixed split) | Accuracy79.31 | 13 | |
| Egocentric Activity Recognition | GTEA 71 | Accuracy78.14 | 13 | |
| Action Recognition | EPIC-KITCHENS S2 (test) | Top-1 Verb Accuracy47.32 | 11 |
Showing 10 of 14 rows