PolyGLU: State-Conditional Activation Routing in Transformer Feed-Forward Networks

About

Biological neural systems employ diverse neurotransmitters -- glutamate, GABA, dopamine, acetylcholine -- to implement distinct signal-processing modalities within shared neural circuits. In contrast, modern transformers apply a single fixed activation function across all feed-forward neurons. We introduce PolyGLU (Polychromatic Gated Linear Unit), a drop-in replacement for SwiGLU that enables each FFN neuron to dynamically route among K=4 activation functions via a differentiable mechanism combining learned static preferences with input-conditioned gating, trained end-to-end with Gumbel-Softmax. We train PolychromaticLM, a 597M-parameter transformer, on ~10B tokens using a single NVIDIA A100 GPU. Our key finding is emergent routing behavior: without any explicit sparsity loss or entropy regularization, the routing mechanism converges to near-deterministic activation selections (mean dynamic entropy = 0.030% of maximum), with a striking depth-dependent specialization pattern -- early layers prefer GELU while deep layers strongly favor Tanh. Three layers maintain elevated routing entropy, suggesting computational flexibility points. The routing architecture adds only 0.23% parameter overhead (~1.4M parameters) and proves fully robust to supervised fine-tuning: routing entropy remains constant at ln(4) throughout 13,067 SFT steps. On standard benchmarks, PolychromaticLM achieves 62-89% of Qwen3-0.6B-Base performance despite training on 3,600x fewer tokens. All code, weights, and training infrastructure are released under Apache 2.0.

Daniel Nobrega Medeiros• 2026

Related benchmarks

Task	Dataset	Result
Commonsense Reasoning	HellaSwag	--	1896
Commonsense Reasoning	WinoGrande	--	1442
Physical Commonsense Reasoning	PIQA	Accuracy58.87	696
Science Question Answering	ARC Challenge	Accuracy24.15	354
Word Prediction	LAMBADA	Accuracy15.35	192
Science Question Answering	ARC Easy	Accuracy41.04	162
Science Question Answering	SciQ	Normalized Accuracy61.2	137
Question Answering	OpenBookQA	Normalized Accuracy29	102
Multi-task Language Understanding	MMLU STEM	Accuracy28.42	13
Reading Comprehension	BoolQ	Score61.13	10

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord