Real-time 3D human action recognition based on Hyperpoint sequence

About

Real-time 3D human action recognition has broad industrial applications, such as surveillance, human-computer interaction, and healthcare monitoring. By relying on complex spatio-temporal local encoding, most existing point cloud sequence networks capture spatio-temporal local structures to recognize 3D human actions. To simplify the point cloud sequence modeling task, we propose a lightweight and effective point cloud sequence network referred to as SequentialPointNet for real-time 3D action recognition. Instead of capturing spatio-temporal local structures, SequentialPointNet encodes the temporal evolution of static appearances to recognize human actions. Firstly, we define a novel type of point data, Hyperpoint, to better describe the temporally changing human appearances. A theoretical foundation is provided to clarify the information equivalence property for converting point cloud sequences into Hyperpoint sequences. Secondly, the point cloud sequence modeling task is decomposed into a Hyperpoint embedding task and a Hyperpoint sequence modeling task. Specifically, for Hyperpoint embedding, the static point cloud technology is employed to convert point cloud sequences into Hyperpoint sequences, which introduces inherent frame-level parallelism; for Hyperpoint sequence modeling, a Hyperpoint-Mixer module is designed as the basic building block to learning the spatio-temporal features of human actions. Extensive experiments on three widely-used 3D action recognition datasets demonstrate that the proposed SequentialPointNet achieves competitive classification performance with up to 10X faster than existing approaches.

Xing Li, Qian Huang, Zhijian Wang, Zhenjie Hou, Tianjin Yang, Zhuang Miao• 2021

Related benchmarks

Task	Dataset	Result
Action Recognition	NTU RGB+D 120 (X-set)	Accuracy95.4	779
Action Recognition	NTU RGB+D (Cross-View)	Accuracy97.6	663
Action Recognition	NTU RGB+D 60 (Cross-View)	Accuracy97.6	601
Action Recognition	NTU RGB+D (Cross-subject)	Accuracy90.3	511
Action Recognition	NTU RGB-D Cross-Subject 60	Accuracy90.3	358
Action Recognition	NTU RGB+D 120 Cross-Subject	Accuracy83.5	249
Action Recognition	MSRAction3D	Accuracy92.64	232
Action Recognition	NTU RGB+D	Accuracy90.3	50
Action Recognition	UTD-MHAD	Accuracy92.31	8

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord