STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks

About

Spiking Neural Networks (SNNs) have gained significant attention due to their biological plausibility and energy efficiency, making them promising alternatives to Artificial Neural Networks (ANNs). However, the performance gap between SNNs and ANNs remains a substantial challenge hindering the widespread adoption of SNNs. In this paper, we propose a Spatial-Temporal Attention Aggregator SNN (STAA-SNN) framework, which dynamically focuses on and captures both spatial and temporal dependencies. First, we introduce a spike-driven self-attention mechanism specifically designed for SNNs. Additionally, we pioneeringly incorporate position encoding to integrate latent temporal relationships into the incoming features. For spatial-temporal information aggregation, we employ step attention to selectively amplify relevant features at different steps. Finally, we implement a time-step random dropout strategy to avoid local optima. As a result, STAA-SNN effectively captures both spatial and temporal dependencies, enabling the model to analyze complex patterns and make accurate predictions. The framework demonstrates exceptional performance across diverse datasets and exhibits strong generalization capabilities. Notably, STAA-SNN achieves state-of-the-art results on neuromorphic datasets CIFAR10-DVS, with remarkable performances of 97.14%, 82.05% and 70.40% on the static datasets CIFAR-10, CIFAR-100 and ImageNet, respectively. Furthermore, our model exhibits improved performance ranging from 0.33\% to 2.80\% with fewer time steps. The code for the model is available on GitHub.

Tianqing Zhang, Kairong Yu, Xian Zhong, Hongwei Wang, Qi Xu, Qiang Zhang• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	ImageNet 1k (test)	Top-1 Accuracy70.4	939
Image Classification	CIFAR100	Accuracy82.05	378
Image Classification	CIFAR100	Accuracy82.05	301
Image Classification	CIFAR10	Accuracy (%)97.14	282
Image Classification	CIFAR-100 standard (test)	Top-1 Accuracy82.05	195
Image Classification	CIFAR-10 standard (test)	Accuracy97.14	97
Neuromorphic Image Classification	DVS-CIFAR10	Accuracy82.1	37
Event-based action recognition	DVS128 Gesture	--	12
Event-based action recognition	CIFAR10-DVS	--	10
Event-based Image Classification	CIFAR10-DVS	Accuracy82.1	8

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord