Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks

About

Spiking Neural Networks (SNNs) have gained significant attention due to their biological plausibility and energy efficiency, making them promising alternatives to Artificial Neural Networks (ANNs). However, the performance gap between SNNs and ANNs remains a substantial challenge hindering the widespread adoption of SNNs. In this paper, we propose a Spatial-Temporal Attention Aggregator SNN (STAA-SNN) framework, which dynamically focuses on and captures both spatial and temporal dependencies. First, we introduce a spike-driven self-attention mechanism specifically designed for SNNs. Additionally, we pioneeringly incorporate position encoding to integrate latent temporal relationships into the incoming features. For spatial-temporal information aggregation, we employ step attention to selectively amplify relevant features at different steps. Finally, we implement a time-step random dropout strategy to avoid local optima. As a result, STAA-SNN effectively captures both spatial and temporal dependencies, enabling the model to analyze complex patterns and make accurate predictions. The framework demonstrates exceptional performance across diverse datasets and exhibits strong generalization capabilities. Notably, STAA-SNN achieves state-of-the-art results on neuromorphic datasets CIFAR10-DVS, with remarkable performances of 97.14%, 82.05% and 70.40% on the static datasets CIFAR-10, CIFAR-100 and ImageNet, respectively. Furthermore, our model exhibits improved performance ranging from 0.33\% to 2.80\% with fewer time steps. The code for the model is available on GitHub.

Tianqing Zhang, Kairong Yu, Xian Zhong, Hongwei Wang, Qi Xu, Qiang Zhang• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationImageNet 1k (test)
Top-1 Accuracy70.4
798
Image ClassificationCIFAR-100 standard (test)
Top-1 Accuracy82.05
133
Image ClassificationCIFAR-10 standard (test)
Accuracy97.14
97
Event-based action recognitionCIFAR10-DVS--
10
Event-based action recognitionDVS128 Gesture--
8
Showing 5 of 5 rows

Other info

Follow for update