BiSpikCLM: A Spiking Language Model integrating Softmax-Free Spiking Attention and Spike-Aware Alignment Distillation

About

Spiking Neural Networks (SNNs) offer promising energy-efficient alternatives to large language models (LLMs) due to their event-driven nature and ultra-low power consumption. However, to preserve capacity, most existing spiking LLMs still incur intensive floating-point matrix multiplication (MatMul) and nonlinearities, or training difficulties arising from the complex spatiotemporal dynamics. To address these challenges, we propose BiSpikCLM, the first fully binary spiking MatMul-free causal language model. BiSpikCLM introduces Softmax-Free Spiking Attention (SFSA), eliminating softmax and floating-point operations in autoregressive language modeling. For efficient training, we introduce Spike-Aware Alignment Distillation (SpAD), which aligns ANN teacher and SNN student across embeddings, attention maps, intermediate features, and output logits. SpAD framework allows BiSpikCLM to reach comparable performance to ANN counterparts using substantially fewer training tokens (e.g., only 5.6% of the tokens for the 1.3B model). As a result, BiSpikCLM achieves competitive performance at only 4.16% - 5.87% of the computational cost on natural language generation tasks. Our results highlight the feasibility and effectiveness of fully binary spike-driven LLMs and establish the distillation as a promising pathway for brain-inspired spiking NLP.

Sihang Guo, Chenlin Zhou, Jiaqi Wang, Kehai Chen, Qingyan Meng, Zhengyu Ma• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning and Question Answering	ACC benchmark ARC-e, ARC-c, Winogrande, BoolQ, PIQA, HellaSwag, OpenBookQA, HeadQA	ARC-e Acc46.3	20
Language Modeling	Evaluation Tasks Zero-shot Average	Zero-shot Average Accuracy42.19	17
Zero-shot Learning	Reasoning Suite Zero-shot (ARC-e, ARC-c, WG, BQ, PIQA, HS, OBQA, HQA)	ARC-e Accuracy46.5	9

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord