ARES: Automated Rubric Synthesis for Scalable LLM Reinforcement Learning
About
Rubric-based rewards offer a promising way to extend reinforcement learning (RL) for large language models beyond tasks with automatically verifiable answers. However, scaling rubric-based RL remains challenging: existing approaches often rely on expert-written rubrics and manually constructed question sets, while fixed task-level rubrics may fail to capture the evaluation requirements of individual questions. We propose ARES (Automated Rubric synthEsis for Scalable RL), a framework for automatically constructing rubric-based RL data at scale. Starting from raw pretraining documents, ARES converts source knowledge into self-contained question-answer pairs and co-generates question-specific weighted rubrics, enabling instance-level reward supervision for open-ended responses. To improve diversity and quality, ARES conditions generation on domain labels and persona information, and applies validation filters for question self-containment, answer faithfulness, and rubric validity. Using ARES, we construct 100K rubric-annotated instances across ten domains. Experiments on seven benchmarks show that rubric-based RL trained with ARES, outperforms continual pretraining, supervised fine-tuning, and binary-reward RL, with the largest gains on multi-dimensional open-ended tasks such as healthcare and instruction following.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Math Reasoning | GSM8K | Accuracy (GSM8K)86.96 | 131 | |
| Knowledge Reasoning | MMLU-Pro | -- | 120 | |
| Writing | WritingBench | Score38.24 | 74 | |
| Language Understanding | MMLU-Pro | MMLU-Pro Accuracy50.56 | 60 | |
| Open-ended writing | WritingBench | Score38.24 | 20 | |
| Instruction Following | IFEval | Score (%)54.88 | 18 | |
| Code Generation | MBPP+ | AVG Score63.16 | 17 | |
| Aggregate General Performance | ARES Evaluation Suite | Average Score52.69 | 5 | |
| Code Generation | HumanEval+ | Score34.76 | 5 | |
| Healthcare QA | HealthBench | Score41.45 | 5 |