Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Stochastic Attention: Connectome-Inspired Randomized Routing for Expressive Linear-Time Attention

About

The whole-brain connectome of a fruit fly comprises over 130K neurons connected with a probability of merely 0.02%, yet achieves an average shortest path of only 4.4 hops. Despite being highly structured at the circuit level, the network's long-range connections are broadly distributed across brain regions, functioning as stochastic shortcuts that enable efficient global communication. Inspired by this observation, we propose Stochastic Attention (SA), a drop-in enhancement for sliding-window attention (SWA) that applies a random permutation to the token sequence before windowed attention and restores the original order afterward. This transforms the fixed local window into a stochastic global one within the same $O(nw)$ per-layer budget. Through depth, independently sampled permutations yield exponentially growing receptive fields, achieving full sequence coverage in $O(\log_w n)$ layers versus $O(n/w)$ for SWA. We validate SA in two settings: pre-training language models from scratch, where a gated SA + SWA combination achieves the best average zero-shot accuracy, and training-free inference on Qwen3-8B and Qwen3-30B-A3B, where SA consistently outperforms SWA and matches or exceeds Mixture of Block Attention at comparable compute budgets. These results suggest that connectome-inspired stochastic routing is a practical primitive for improving the expressivity of efficient attention, complementary to existing linear and sparse approaches.

Zehao Jin, Yanan Sui• 2026

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningHellaSwag
HellaSwag Accuracy75
711
Question AnsweringARC Challenge
Accuracy (ARC)56.5
598
Question AnsweringARC Easy--
597
Boolean Question AnsweringBoolQ
Accuracy86.6
350
Language ModelingLAMBADA
Accuracy64.6
103
Code GenerationHumanEval
HumanEval Accuracy65.2
49
Zero-shot Language Understanding and ReasoningLLM Evaluation Suite (HellaSwag, MMLU, ARC-C, BoolQ, Lambada, ARC-E, HumanEval) zero-shot Qwen3-30B-A3B
HellaSwag Accuracy79.8
12
Showing 7 of 7 rows

Other info

Follow for update