SpikeBERT: A Language Spikformer Learned from BERT with Knowledge Distillation

About

Spiking neural networks (SNNs) offer a promising avenue to implement deep neural networks in a more energy-efficient way. However, the network architectures of existing SNNs for language tasks are still simplistic and relatively shallow, and deep architectures have not been fully explored, resulting in a significant performance gap compared to mainstream transformer-based networks such as BERT. To this end, we improve a recently-proposed spiking Transformer (i.e., Spikformer) to make it possible to process language tasks and propose a two-stage knowledge distillation method for training it, which combines pre-training by distilling knowledge from BERT with a large collection of unlabelled texts and fine-tuning with task-specific instances via knowledge distillation again from the BERT fine-tuned on the same training examples. Through extensive experimentation, we show that the models trained with our method, named SpikeBERT, outperform state-of-the-art SNNs and even achieve comparable results to BERTs on text classification tasks for both English and Chinese with much less energy consumption. Our code is available at https://github.com/Lvchangze/SpikeBERT.

Changze Lv, Tianlong Li, Jianhan Xu, Chenxi Gu, Zixuan Ling, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang• 2023

Related benchmarks

Task	Dataset	Result
Natural Language Understanding	GLUE (dev)	SST-2 (Acc)85.4	529
Text Classification	SST-2 (test)	Accuracy81.71	185
Classification	CIFAR10-DVS	Accuracy76.4	164
Text Classification	MR (test)	Accuracy75.87	155
Subjectivity Classification	Subj (test)	Accuracy91.6	152
Text Classification	SST-5 (test)	Accuracy41.84	60
Time Series Forecasting	METR-LA	--	39
Time Series Forecasting	solar	R2 (6h)0.929	19
Time Series Forecasting	PEMS-BAY	R2 (Horizon 6)0.768	19
Time Series Forecasting	Electricity	R2 (Horizon 6)0.964	12

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord