Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

About

Reasoning language models have shown an uncanny ability to improve performance at test-time by ``thinking longer''-that is, by generating longer chain-of-thought sequences and hence using more compute. However, the length of their chain-of-thought reasoning is not controllable, making it impossible to allocate test-time compute to achieve a desired level of performance. We introduce Length Controlled Policy Optimization (LCPO), a simple reinforcement learning method that optimizes for accuracy and adherence to user-specified length constraints. We use LCPO to train L1, a reasoning language model that produces outputs satisfying a length constraint given in its prompt. L1's length control allows for smoothly trading off computational cost and accuracy on a wide range of tasks, and outperforms the state-of-the-art S1 method for length control. Furthermore, we uncover an unexpected short chain-of-thought capability in models trained with LCPO. Specifically, using LCPO we derive Short Reasoning Models (SRMs), that exhibit similar reasoning patterns as full-length reasoning models, but can generate CoT lengths comparable to non-reasoning models. They demonstrate significant performance gains, for instance, our 1.5B L1 model surpasses GPT-4o at equal reasoning lengths. Overall, LCPO enables precise control over reasoning length, allowing for fine-grained allocation of test-time compute and accuracy. We release code and models at https://www.cmu-l3.github.io/l1

Pranjal Aggarwal, Sean Welleck• 2025

Related benchmarks

TaskDatasetResultRank
Commonsense ReasoningWinoGrande--
1085
Mathematical ReasoningMATH500 (test)
Accuracy87.8
514
Mathematical ReasoningGSM8K
Accuracy90.48
499
Multi-discipline Multimodal UnderstandingMMMU
Accuracy59.5
317
Mathematical ReasoningMAWPS
Accuracy97.5
234
Visual Mathematical ReasoningMathVision
Accuracy24.7
186
Mathematical ReasoningAIME 24
Accuracy25
154
Mathematical ReasoningMATH 500
Accuracy (Acc)89.2
149
Mathematical ReasoningMathVision
Accuracy29.5
144
Mathematical ReasoningAMC 23
Accuracy83.5
81
Showing 10 of 73 rows
...

Other info

Follow for update