Alternative Semantic Representations for Zero-Shot Human Action Recognition
About
A proper semantic representation for encoding side information is key to the success of zero-shot learning. In this paper, we explore two alternative semantic representations especially for zero-shot human action recognition: textual descriptions of human actions and deep features extracted from still images relevant to human actions. Such side information are accessible on Web with little cost, which paves a new way in gaining side information for large-scale zero-shot human action recognition. We investigate different encoding methods to generate semantic representations for human actions from such side information. Based on our zero-shot visual recognition method, we conducted experiments on UCF101 and HMDB51 to evaluate two proposed semantic representations . The results suggest that our proposed text- and image-based semantic representations outperform traditional attributes and word vectors considerably for zero-shot human action recognition. In particular, the image-based semantic representations yield the favourable performance even though the representation is extracted from a small number of images per class.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Action Recognition | UCF101 (test) | Accuracy24.4 | 307 | |
| Action Recognition | HMDB51 (test) | Accuracy0.218 | 249 | |
| Action Recognition | HMDB51 | Top-1 Acc21.8 | 225 | |
| Action Recognition | UCF-101 | Top-1 Acc54.4 | 147 | |
| Zero-shot Action Recognition | UCF101 (test) | Accuracy24.4 | 33 | |
| Action Recognition | HMDB51 | Top-1 Acc21.8 | 30 | |
| Zero-shot Action Recognition | HMDB51 (test) | Accuracy21.8 | 25 | |
| Action Recognition | UCF101 | Top-1 Accuracy24.4 | 15 | |
| Activity Recognition | UCF-101 first split among three (test) | Top-1 Accuracy24.4 | 10 | |
| Activity Recognition | HMDB-51 first split among three (test) | Top-1 Accuracy21.8 | 10 |