Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

About

Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in large reasoning models. To analyze reasoning dynamics, we use synthetic logic puzzles as training data due to their controllable complexity and straightforward answer verification. We make some key technical contributions that lead to effective and stable RL training: a system prompt that emphasizes the thinking and answering process, a stringent format reward function that penalizes outputs for taking shortcuts, and a straightforward training recipe that achieves stable convergence. Our 7B model develops advanced reasoning skills-such as reflection, verification, and summarization-that are absent from the logic corpus. Remarkably, after training on just 5K logic problems, it demonstrates generalization abilities to the challenging math benchmarks AIME and AMC.

Tian Xie, Zitian Gao, Qingnan Ren, Haoming Luo, Yuqian Hong, Bryan Dai, Joey Zhou, Kai Qiu, Zhirong Wu, Chong Luo• 2025

Related benchmarks

Task	Dataset	Result
Knowledge Reasoning	MMLU-Pro	--	120
Writing	WritingBench	Score54.5	74
Code	HumanEval+	Accuracy64	43
Logic reasoning	ZebraLogic	Score10.1	42
Coding	HumanEval	HumanEval Mean Score0.689	32
Large Language Model Evaluation	MMLU, GSM8K, GPQA, HUMANEVAL, TRUTHFULQA, IFEVAL	MMLU63.8	23
Mathematical Reasoning	Minerva	Avg@254.9	16
STEM Reasoning	TheoremQA	Avg@252.2	16
Mathematical Reasoning	MATH 500	Avg@276.5	16
Logic reasoning	Autologic cn	Score25.1	16

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord