Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness

About

Although large language models rely on chain-of-thought for complex reasoning, the overthinking phenomenon severely degrades inference efficiency. Existing reinforcement learning methods compress reasoning chains by designing elaborate reward functions, which renders high-quality samples extremely sparse in the exploration space and creates a sampling bottleneck for the prior policy. Inspired by cognitive science, we theoretically prove that a posterior distribution guided by reference answers achieves higher expected utility than the prior distribution, thus capable of breaking through the sampling bottleneck of high-quality samples. However, the posterior distribution is unavailable during inference. To this end, we formalize efficient reasoning as a variational inference problem and introduce an efficiency-aware evidence lower bound as the theoretical foundation. Based on this, we propose the VPG-EA framework. It adopts a parameter-shared dual-stream architecture to instantiate both the posterior distribution and the prior policy; after filtering out pseudo-efficient paths via cross-view evaluation, it unidirectionally transfers the posterior's efficient patterns to the prior policy through variational distillation. Experiments on DeepSeek-R1-Distill-Qwen-1.5B and 7B scales demonstrate that VPG-EA improves the comprehensive efficiency metric epsilon cubed by 8.73% and 12.37% over the strongest baselines on each model size, respectively.

Zizhao Chen, Yuying Li, Siting Lin, Lianxi Wang• 2026

Related benchmarks

Task	Dataset	Result
General Reasoning	GPQA-Diamond & MMLU-Pro	Accuracy43.83	35
General Reasoning	GPQA Diamond	Accuracy38.88	31
Mathematical Reasoning	GSM8K	Accuracy (ACC)92	14
Mathematical Reasoning	MATH 500	Accuracy (ACC)93	14
Mathematical Reasoning	AIME24	Accuracy (ACC)56.67	14
General Reasoning	MMLU-Pro	ACC48.77	14
Mathematical Reasoning	AIME 25	ACC36.67	14
Mathematical Reasoning	MATH 500	Accuracy89.6	5
Mathematical Reasoning	AIME24	ACC46.7	5
Mathematical Reasoning	GSM8K	Accuracy89.16	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord