Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention

About

The whole-brain connectome of a fruit fly comprises over 130K neurons connected with a probability of merely 0.02%, yet achieves an average shortest path of only 4.4 hops. Despite being highly structured at the circuit level, the network's long-range connections are broadly distributed across brain regions, functioning as stochastic shortcuts that enable efficient global communication. Inspired by this observation, we propose Stochastic Attention (SA), a drop-in enhancement for sliding-window attention (SWA) that applies a random permutation to the token sequence before windowed attention and restores the original order afterward. This transforms the fixed local window into a stochastic global one within the same $O(nw)$ per-layer budget. Through depth, independently sampled permutations yield exponentially growing receptive fields, achieving full sequence coverage in $O(\log_w n)$ layers versus $O(n/w)$ for SWA. We validate SA in two settings: pre-training language models from scratch, where a gated SA + SWA combination achieves the best average zero-shot accuracy, and training-free inference on Qwen3-8B and Qwen3-30B-A3B, where SA consistently outperforms SWA and matches or exceeds Mixture of Block Attention at comparable compute budgets. These results suggest that connectome-inspired stochastic routing is a practical primitive for improving the expressivity of efficient attention, complementary to existing linear and sparse approaches.

Zehao Jin, Yanan Sui• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	HellaSwag Accuracy75	711
Question Answering	ARC Challenge	Accuracy (ARC)56.5	598
Question Answering	ARC Easy	--	597
Boolean Question Answering	BoolQ	Accuracy86.6	350
Language Modeling	LAMBADA	Accuracy64.6	103
Code Generation	HumanEval	HumanEval Accuracy65.2	49
Zero-shot Language Understanding and Reasoning	LLM Evaluation Suite (HellaSwag, MMLU, ARC-C, BoolQ, Lambada, ARC-E, HumanEval) zero-shot Qwen3-30B-A3B	HellaSwag Accuracy79.8	12

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord