Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge

About

Large language models often solve complex reasoning tasks more effectively with Chain-of-Thought (CoT), but at the cost of long, low-bandwidth token sequences. Humans, by contrast, often reason softly by maintaining a distribution over plausible next steps. Motivated by this, we propose Multiplex Thinking, a stochastic soft reasoning mechanism that, at each thinking step, samples K candidate tokens and aggregates their embeddings into a single continuous multiplex token. This preserves the vocabulary embedding prior and the sampling dynamics of standard discrete generation, while inducing a tractable probability distribution over multiplex rollouts. Consequently, multiplex trajectories can be directly optimized with on-policy reinforcement learning (RL). Importantly, Multiplex Thinking is self-adaptive: when the model is confident, the multiplex token is nearly discrete and behaves like standard CoT; when it is uncertain, it compactly represents multiple plausible next steps without increasing sequence length. Across challenging math reasoning benchmarks, Multiplex Thinking consistently outperforms strong discrete CoT and RL baselines from Pass@1 through Pass@1024, while producing shorter sequences. The code and checkpoints are available at https://github.com/GMLR-Penn/Multiplex-Thinking.

Yao Tang, Li Dong, Yaru Hao, Qingxiu Dong, Furu Wei, Jiatao Gu• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MATH 500	Accuracy58.5	221
General Reasoning	MMLU-Pro	Accuracy24.5	201
Mathematical Reasoning	AMC 2023	--	144
Mathematical Reasoning	Minerva	Pass@138.6	138
Code Reasoning	HumanEval	HumanEval Score44.7	62
Mathematical Reasoning	AIME 2024	Pass@120.6	54
Mathematical Reasoning	MATH 500	Pass@178	33
Mathematical Reasoning	AIME 2025	Pass@119.7	33
Math Reasoning	OlympiadBench	Pass@1 Accuracy41.7	19

Showing 9 of 9 rows

Other info

GitHub

Follow for update

@wizwand_team Discord