Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

About

Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]

Jun Liu, Amir Shahroudy, Mauricio Perez, Gang Wang, Ling-Yu Duan, Alex C. Kot• 2019

Related benchmarks

TaskDatasetResultRank
Action RecognitionNTU RGB+D 120 (X-set)
Accuracy40.1
661
Action RecognitionNTU RGB+D 60 (X-sub)--
467
Action RecognitionNTU RGB+D 120 Cross-Subject
Accuracy48.7
183
Action RecognitionNTU 120 (Cross-Setup)
Accuracy63.1
112
3D Action RecognitionNTU RGB+D 60 (Cross-View)--
29
Action RecognitionNTU-120 1.0 (Cross-Subject 1 (CS1))
Top-1 Accuracy61.2
28
Action RecognitionNTU RGB+D 120 one-shot protocol
Accuracy45.3
26
3D Action RecognitionNTU RGB+D 120 One-shot (20 novel classes)
Accuracy45.3
4
Showing 8 of 8 rows

Other info

Follow for update