Enhancing the Performance of Transformer-based Spiking Neural Networks by SNN-optimized Downsampling with Precise Gradient Backpropagation

About

Deep spiking neural networks (SNNs) have drawn much attention in recent years because of their low power consumption, biological rationality and event-driven property. However, state-of-the-art deep SNNs (including Spikformer and Spikingformer) suffer from a critical challenge related to the imprecise gradient backpropagation. This problem arises from the improper design of downsampling modules in these networks, and greatly hampering the overall model performance. In this paper, we propose ConvBN-MaxPooling-LIF (CML), an SNN-optimized downsampling with precise gradient backpropagation. We prove that CML can effectively overcome the imprecision of gradient backpropagation from a theoretical perspective. In addition, we evaluate CML on ImageNet, CIFAR10, CIFAR100, CIFAR10-DVS, DVS128-Gesture datasets, and show state-of-the-art performance on all these datasets with significantly enhanced performances compared with Spikingformer. For instance, our model achieves 77.64 $\%$ on ImageNet, 96.04 $\%$ on CIFAR10, 81.4$\%$ on CIFAR10-DVS, with + 1.79$\%$ on ImageNet, +1.16$\%$ on CIFAR100 compared with Spikingformer.

Chenlin Zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Zhengyu Ma, Huihui Zhou, Xiaopeng Fan, Yonghong Tian• 2023

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-10	--	875
Image Classification	CIFAR-100	--	357
Image Classification	CIFAR100	Accuracy80.37	301
Image Classification	CIFAR10	Accuracy (%)95.95	282
Image Classification	Tiny-ImageNet	Accuracy (%)66.59	131
Image Classification	CIFAR10-DVS (test)	Accuracy80.5	101
Gesture Recognition	DVS-Gesture (test)	Accuracy98.6	79
Action Recognition	DVS128 Gesture	Specific Accuracy (SA)98.6	13
Action Recognition	CIFAR10-DVS	Accuracy81.4	9
Image Retrieval	CIFAR10	mAP (32-bit)92.68	7

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord