Signal-SGN++: Topology-Enhanced Time-Frequency Spiking Graph Network for Skeleton-Based Action Recognition
About
Graph Convolutional Networks (GCNs) demonstrate strong capability in modeling skeletal topology for action recognition, yet their dense floating-point computations incur high energy costs. Spiking Neural Networks (SNNs), characterized by event-driven and sparse activation, offer energy efficiency but remain limited in capturing coupled temporal-frequency and topological dependencies of human motion. To bridge this gap, this article proposes Signal-SGN++, a topology-aware spiking graph framework that integrates structural adaptivity with time-frequency spiking dynamics. The network employs a backbone composed of 1D Spiking Graph Convolution (1D-SGC) and Frequency Spiking Convolution (FSC) for joint spatiotemporal and spectral feature extraction. Within this backbone, a Topology-Shift Self-Attention (TSSA) mechanism is embedded to adaptively route attention across learned skeletal topologies, enhancing graph-level sensitivity without increasing computational complexity. Moreover, an auxiliary Multi-Scale Wavelet Transform Fusion (MWTF) branch decomposes spiking features into multi-resolution temporal-frequency representations, wherein a Topology-Aware Time-Frequency Fusion (TATF) unit incorporates structural priors to preserve topology-consistent spectral fusion. Comprehensive experiments on large-scale benchmarks validate that Signal-SGN++ achieves superior accuracy-efficiency trade-offs, outperforming existing SNN-based methods and achieving competitive results against state-of-the-art GCNs under substantially reduced energy consumption.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Skeleton-based Action Recognition | NTU RGB+D (Cross-View) | Accuracy94.5 | 213 | |
| Skeleton-based Action Recognition | NTU RGB+D 120 Cross-Subject | Top-1 Accuracy76.5 | 143 | |
| Skeleton-based Action Recognition | NTU-RGB+D 120 (Cross-setup) | Accuracy78.9 | 136 | |
| Skeleton-based Action Recognition | NTU RGB+D (Cross-subject) | Accuracy87.2 | 123 | |
| Skeleton-based Action Recognition | NW-UCLA | Accuracy96.3 | 44 |