Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation

About

Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets. The most crucial factors for this task lie in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolutions. In this paper we propose an end-to-end convolutional co-occurrence feature learning framework. The co-occurrence features are learned with a hierarchical methodology, in which different levels of contextual information are aggregated gradually. Firstly point-level information of each joint is encoded independently. Then they are assembled into semantic representation in both spatial and temporal domains. Specifically, we introduce a global spatial aggregation scheme, which is able to learn superior joint co-occurrence features over local aggregation. Besides, raw skeleton coordinates as well as their temporal difference are integrated with a two-stream paradigm. Experiments show that our approach consistently outperforms other state-of-the-arts on action recognition and detection benchmarks like NTU RGB+D, SBU Kinect Interaction and PKU-MMD.

Chao Li, Qiaoyong Zhong, Di Xie, Shiliang Pu• 2018

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D (Cross-View)
Accuracy91.9
609
Action RecognitionNTU RGB+D 60 (Cross-View)
Accuracy91.1
575
Action RecognitionNTU RGB+D (Cross-subject)
Accuracy86.5
474
Action RecognitionNTU RGB+D 60 (X-sub)
Accuracy56.8
467
Action RecognitionNTU RGB-D Cross-Subject 60
Accuracy86.5
305
Skeleton-based Action RecognitionNTU 60 (X-sub)
Accuracy86.5
220
Skeleton-based Action RecognitionNTU RGB+D (Cross-View)
Accuracy91.1
213
Skeleton-based Action RecognitionNTU RGB+D (Cross-subject)
Accuracy86.5
123
Skeleton-based Action RecognitionNTU 60 (X-view)
Accuracy91.1
119
Skeleton-based Action RecognitionNTU RGB+D 60 (Cross-Subject)
Accuracy86.5
59
Showing 10 of 22 rows

Other info

Follow for update