Thinking Fast and Right: Balancing Accuracy and Reasoning Length with Adaptive Rewards

About

Large language models (LLMs) have demonstrated strong reasoning abilities in mathematical tasks, often enhanced through reinforcement learning (RL). However, RL-trained models frequently produce unnecessarily long reasoning traces -- even for simple queries -- leading to increased inference costs and latency. While recent approaches attempt to control verbosity by adding length penalties to the reward function, these methods rely on fixed penalty terms that are hard to tune and cannot adapt as the model's reasoning capability evolves, limiting their effectiveness. In this work, we propose an adaptive reward-shaping method that enables LLMs to "think fast and right" -- producing concise outputs without sacrificing correctness. Our method dynamically adjusts the reward trade-off between accuracy and response length based on model performance: when accuracy is high, the length penalty increases to encourage faster length reduction; when accuracy drops, the penalty is relaxed to preserve correctness. This adaptive reward accelerates early-stage length reduction while avoiding over-compression in later stages. Experiments across multiple datasets show that our approach consistently and dramatically reduces reasoning length while largely maintaining accuracy, offering a new direction for cost-efficient adaptive reasoning in large-scale language models.

Jinyan Su, Claire Cardie• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	GSM8K	Accuracy86.4	499
Mathematical Reasoning	Olympiad Bench	Accuracy54.03	73
Mathematical Reasoning	MATH500	Accuracy0.893	34
Mathematical Reasoning	AIME25	Accuracy29.2	12
Mathematical Reasoning	Average GSM8K, MATH500, AMC23, AIME25	Accuracy71	12
Mathematical Reasoning	AMC23	Accuracy79.1	12
General Science Reasoning	GPQA Diamond	Accuracy56.41	10
Mathematical Reasoning	AIME 2024	Accuracy58.75	10
Aggregate Reasoning	Average Olympiad, MATH, AMC, AIME, GPQA	Average Accuracy65.12	10
Mathematical Reasoning	AMC	Accuracy88.13	10

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord