Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EvoRubric: Self-Evolving Rubric-Driven RL for Open-Ended Generation

About

Reinforcement Learning (RL) has significantly advanced Large Language Models (LLMs) in verifiable domains, but aligning models for open-ended generation remains profoundly challenging due to the lack of definitive rewards. Current rubric-based RL methods mitigate this by employing explicit criteria; however, they rely heavily on static, human-annotated rubrics that inevitably cause policy lag, or expensive external proprietary models for dynamic updates. In this paper, we propose EvoRubric, a novel single-policy co-evolutionary RL framework that eliminates the reliance on static criteria and on external rubric generators. By unifying response generation and rubric generation under a single parameterized policy, EvoRubric dynamically alternates between a Reasoner and a Rubric Generator. To prevent reward hacking and ensure the reliability of generated signals, we introduce a multi-level verification pipeline featuring a meta-verifier, zero-variance pruning, and a Leave-One-Out peer consensus mechanism. Validated criteria are dynamically archived into a memory pool, yielding dense, multi-objective rewards to continuously co-optimize both roles. Extensive experiments across Medical, Writing, and Science domains demonstrate that EvoRubric consistently outperforms traditional static and external-LLM-driven alignment methods. Notably, our framework is compatible with human-expert priors. When initialized with expert-annotated rubrics, EvoRubric can further uncover novel, discriminative dimensions, achieving better performance than relying solely on static expert annotations.

Xin Guan, Xiaomeng Hu, Shen Huang, Zhenyi Wang, Bo Zhang, Zijian Li, Pengjun Xie, Bo Liu, Jiuxin Cao• 2026

Related benchmarks

TaskDatasetResultRank
Open-ended writingWritingBench
Score75.76
20
Medical Question AnsweringHealthBench Medical
Score56.36
10
Multi-domain evaluationAggregate HealthBench, LLMMed-Eval, WritingBench, Creative Writing, ResearchQA
Macro-average Score70.55
10
Science Question AnsweringResearchQA Science
Score77.31
10
Creative Writing GenerationCreative Writing
Score69.88
10
Medical Question AnsweringLLMMed-Eval Medical
Score73.46
10
Showing 6 of 6 rows

Other info

Follow for update