Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

About

Rule-based reasoning is acknowledged as one of the fundamental problems of reasoning. While recent studies show that large reasoning models (LRMs) have remarkable reasoning capabilities enhanced by reinforcement learning (RL), real applications still face severe challenges due to variations in rule formats, types, and complexity. To mitigate this issue, we introduce RuleReasoner, an effective method for rule-based reasoning via a wide collection of curated tasks and a novel domain-aware dynamic sampling approach in RL. Specifically, RuleReasoner resamples each training batch by updating the domain weights based on historical rewards. This facilitates domain balance and active learning schedules for RL, obviating static mix-training engineered by human. Evaluations on in-distribution (ID) and out-of-distribution (OOD) benchmarks reveal that RuleReasoner outperforms frontier LRMs by a significant margin ($\Delta$4.1% on eight ID tasks and $\Delta$10.4% on three OOD tasks over OpenAI-o1). Notably, our approach also exhibits higher computational efficiency compared to prior methods.

Yang Liu, Jiaqi Li, Zilong Zheng• 2025

Related benchmarks

TaskDatasetResultRank
Deductive ReasoningPrOntoQA
Pass@10.964
18
First-Order Logic ReasoningFOLIO
Pass@1 Success Rate84.7
18
Inductive ReasoningCLUTRR
Pass@195.5
18
Deductive ReasoningProofWriter
Pass@197
18
Logical reasoningLogiQA
Pass@1 Accuracy0.835
18
First-Order Logic ReasoningLogicNLI
Pass@170.4
18
Logical reasoningLogical Deduction
Pass@10.983
18
Multi-Task ReasoningGPQA Diamond
Pass@144.9
12
Math ReasoningAIME 2025 (OOD)
Pass@1 Rate23.3
2
Symbolic ReasoningCoin Flip (OOD)
pass@195.1
2
Showing 10 of 10 rows

Other info

Follow for update