Revisiting LLM Reasoning via Information Bottleneck

About

Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR). By leveraging simple rule-based rewards, RL effectively incentivizes LLMs to produce extended chain-of-thought (CoT) reasoning trajectories, progressively guiding them toward correct answers. However, existing approaches remain largely heuristic and intuition-driven, limiting the development of principled methodologies. In this paper, we present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle, introducing IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable across diverse prompts. We derive a practical token-level surrogate objective and propose an efficient approximation, resulting in the lightweight IB regularization method. This technique integrates seamlessly into existing RL-based post-training frameworks without additional computational overhead, requiring only a one-line code modification. Empirically, we validate IB regularization across multiple mathematical reasoning benchmarks and RL algorithms, demonstrating consistent improvements in LLM reasoning performance.

Shiye Lei, Zhihao Cheng, Kai Jia, Dacheng Tao• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AIME 25	Accuracy15.7	112
Instruction Following	IFEval	Accuracy (IFEval)54.3	101
Science Reasoning	GPQA	Accuracy (GPQA)44.7	72
Mathematics	AIME 25	Avg@3214.5	20
Mathematics	AIME 24	Avg@320.169	20
Comprehensive Evaluation	Overall Across Benchmarks	Avg@32 Accuracy41.6	16
Instruction	IFEval	Avg@32 Accuracy44.7	16
Mathematics	MATH 500	Accuracy (avg@32)82	16
Mathematics	AMC 23	Avg@32 Accuracy55.3	16
Mathematics	AMC 24	Accuracy (avg@32)39.5	16

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord