T1: Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling

About

Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. However, existing approaches mainly rely on imitation learning and struggle to achieve effective test-time scaling. While reinforcement learning (RL) holds promise for enabling self-exploration, recent attempts yield modest improvements in complex reasoning. In this paper, we present T1 to scale RL by encouraging exploration and understand inference scaling. We first initialize the LLM using synthesized chain-of-thought data that integrates trial-and-error and self-verification. To scale RL training, we promote increased sampling diversity through oversampling. We demonstrate that T1 with open LLMs as its base exhibits inference scaling behavior and achieves superior performance on challenging math reasoning benchmarks. More importantly, we present a simple strategy to examine inference scaling, where increased inference budgets directly lead to T1's better performance without any additional verification.

Zhenyu Hou, Xin Lv, Rui Lu, Jiajie Zhang, Yujiang Li, Zijun Yao, Juanzi Li, Jie Tang, Yuxiao Dong• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	AIME 24	Accuracy3.3	318
Mathematical Reasoning	Minerva Math	Accuracy25	233
Mathematical Reasoning	Minerva Math	Accuracy20.2	228
Mathematical Reasoning	Olympiad Bench	Accuracy29.6	222
Mathematical Reasoning	AIME 2024	Accuracy52.4	220
Mathematical Reasoning	AIME 2025	Accuracy42.6	214
Knowledge Reasoning	MMLU-Pro	Accuracy73.7	120
Mathematical Reasoning	AMC 23	Accuracy30	113
Mathematical Reasoning	Minerva Math	Accuracy46.3	104
Mathematical Reasoning	Olympiad Math	Accuracy60.1	35

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord