Signal-SGN: A Spiking Graph Convolutional Network for Skeletal Action Recognition via Learning Temporal-Frequency Dynamics

About

For multimodal skeleton-based action recognition, Graph Convolutional Networks (GCNs) are effective models. Still, their reliance on floating-point computations leads to high energy consumption, limiting their applicability in battery-powered devices. While energy-efficient, Spiking Neural Networks (SNNs) struggle to model skeleton dynamics, leading to suboptimal solutions. We propose Signal-SGN (Spiking Graph Convolutional Network), which utilizes the temporal dimension of skeleton sequences as the spike time steps and represents features as multi-dimensional discrete stochastic signals for temporal-frequency domain feature extraction. It combines the 1D Spiking Graph Convolution (1D-SGC) module and the Frequency Spiking Convolution (FSC) module to extract features from the skeleton represented as spiking form. Additionally, the Multi-Scale Wavelet Transform Feature Fusion (MWTF) module is proposed to extract dynamic spiking features and capture frequency-specific characteristics, enhancing classification performance. Experiments across three large-scale datasets reveal Signal-SGN exceeding state-of-the-art SNN-based methods in accuracy and computational efficiency while attaining comparable performance with GCN methods and significantly reducing theoretical energy consumption.

Naichuan Zheng, Yuchen Du, Hailun Xia, Zeyu Liang• 2024

Related benchmarks

Task	Dataset	Result
Action Recognition	NTU RGB+D 120 (X-set)	Accuracy77.9	779
Action Recognition	NTU RGB+D (Cross-View)	Accuracy93.1	663
Action Recognition	NTU RGB+D (Cross-subject)	Accuracy86.1	511
Action Recognition	NTU RGB+D 120 Cross-Subject	Accuracy75.3	249
Skeleton-based Action Recognition	NTU RGB+D (Cross-View)	Accuracy93.1	213
Skeleton-based Action Recognition	NTU RGB+D 120 Cross-Subject	Top-1 Accuracy75.3	143
Skeleton-based Action Recognition	NTU-RGB+D 120 (Cross-setup)	Accuracy77.9	136
Action Recognition	NW-UCLA	Top-1 Acc95.9	128
Skeleton-based Action Recognition	NTU RGB+D (Cross-subject)	Accuracy86.1	123
Skeleton-based Action Recognition	NW-UCLA	Accuracy95.9	44

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord