Efficient LLM Reasoning via Variational Posterior Guidance with Efficiency Awareness
About
Although large language models rely on chain-of-thought for complex reasoning, the overthinking phenomenon severely degrades inference efficiency. Existing reinforcement learning methods compress reasoning chains by designing elaborate reward functions, which renders high-quality samples extremely sparse in the exploration space and creates a sampling bottleneck for the prior policy. Inspired by cognitive science, we theoretically prove that a posterior distribution guided by reference answers achieves higher expected utility than the prior distribution, thus capable of breaking through the sampling bottleneck of high-quality samples. However, the posterior distribution is unavailable during inference. To this end, we formalize efficient reasoning as a variational inference problem and introduce an efficiency-aware evidence lower bound as the theoretical foundation. Based on this, we propose the VPG-EA framework. It adopts a parameter-shared dual-stream architecture to instantiate both the posterior distribution and the prior policy; after filtering out pseudo-efficient paths via cross-view evaluation, it unidirectionally transfers the posterior's efficient patterns to the prior policy through variational distillation. Experiments on DeepSeek-R1-Distill-Qwen-1.5B and 7B scales demonstrate that VPG-EA improves the comprehensive efficiency metric epsilon cubed by 8.73% and 12.37% over the strongest baselines on each model size, respectively.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| General Reasoning | GPQA-Diamond & MMLU-Pro | Accuracy43.83 | 35 | |
| General Reasoning | GPQA Diamond | Accuracy38.88 | 19 | |
| Mathematical Reasoning | GSM8K | Accuracy (ACC)92 | 14 | |
| Mathematical Reasoning | MATH 500 | Accuracy (ACC)93 | 14 | |
| Mathematical Reasoning | AIME24 | Accuracy (ACC)56.67 | 14 | |
| General Reasoning | MMLU-Pro | ACC48.77 | 14 | |
| Mathematical Reasoning | AIME 25 | ACC36.67 | 14 | |
| Mathematical Reasoning | MATH 500 | Accuracy89.6 | 5 | |
| Mathematical Reasoning | AIME24 | ACC46.7 | 5 | |
| Mathematical Reasoning | GSM8K | Accuracy89.16 | 3 |