Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Spiking Wavelet Transformer

About

Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning by emulating the event-driven processing manner of the brain. Incorporating Transformers with SNNs has shown promise for accuracy. However, they struggle to learn high-frequency patterns, such as moving edges and pixel-level brightness changes, because they rely on the global self-attention mechanism. Learning these high-frequency representations is challenging but essential for SNN-based event-driven vision. To address this issue, we propose the Spiking Wavelet Transformer (SWformer), an attention-free architecture that effectively learns comprehensive spatial-frequency features in a spike-driven manner by leveraging the sparse wavelet transform. The critical component is a Frequency-Aware Token Mixer (FATM) with three branches: 1) spiking wavelet learner for spatial-frequency domain learning, 2) convolution-based learner for spatial feature extraction, and 3) spiking pointwise convolution for cross-channel information aggregation - with negative spike dynamics incorporated in 1) to enhance frequency representation. The FATM enables the SWformer to outperform vanilla Spiking Transformers in capturing high-frequency visual components, as evidenced by our empirical results. Experiments on both static and neuromorphic datasets demonstrate SWformer's effectiveness in capturing spatial-frequency patterns in a multiplication-free and event-driven fashion, outperforming state-of-the-art SNNs. SWformer achieves a 22.03% reduction in parameter count, and a 2.52% performance improvement on the ImageNet dataset compared to vanilla Spiking Transformers. The code is available at: https://github.com/bic-L/Spiking-Wavelet-Transformer.

Yuetong Fang, Ziqing Wang, Lingfeng Zhang, Jiahang Cao, Honglei Chen, Renjing Xu• 2024

Related benchmarks

TaskDatasetResultRank
Skeleton-based Action RecognitionNTU RGB+D (Cross-View)
Accuracy81.2
213
Skeleton-based Action RecognitionNTU RGB+D 120 Cross-Subject
Top-1 Accuracy63.5
143
Skeleton-based Action RecognitionNTU-RGB+D 120 (Cross-setup)
Accuracy64.7
136
Skeleton-based Action RecognitionNTU RGB+D (Cross-subject)
Accuracy74.7
123
Skeleton-based Action RecognitionNW-UCLA
Accuracy86.7
44
Image ClassificationCIFAR10 standard (test)
Top-1 Accuracy95.31
35
Image ClassificationCIFAR100 standard (test)
Top-1 Accuracy76.99
13
Showing 7 of 7 rows

Other info

Follow for update