EvoRubric: Self-Evolving Rubric-Driven RL for Open-Ended Generation

About

Reinforcement Learning (RL) has significantly advanced Large Language Models (LLMs) in verifiable domains, but aligning models for open-ended generation remains profoundly challenging due to the lack of definitive rewards. Current rubric-based RL methods mitigate this by employing explicit criteria; however, they rely heavily on static, human-annotated rubrics that inevitably cause policy lag, or expensive external proprietary models for dynamic updates. In this paper, we propose EvoRubric, a novel single-policy co-evolutionary RL framework that eliminates the reliance on static criteria and on external rubric generators. By unifying response generation and rubric generation under a single parameterized policy, EvoRubric dynamically alternates between a Reasoner and a Rubric Generator. To prevent reward hacking and ensure the reliability of generated signals, we introduce a multi-level verification pipeline featuring a meta-verifier, zero-variance pruning, and a Leave-One-Out peer consensus mechanism. Validated criteria are dynamically archived into a memory pool, yielding dense, multi-objective rewards to continuously co-optimize both roles. Extensive experiments across Medical, Writing, and Science domains demonstrate that EvoRubric consistently outperforms traditional static and external-LLM-driven alignment methods. Notably, our framework is compatible with human-expert priors. When initialized with expert-annotated rubrics, EvoRubric can further uncover novel, discriminative dimensions, achieving better performance than relying solely on static expert annotations.

Xin Guan, Xiaomeng Hu, Shen Huang, Zhenyi Wang, Bo Zhang, Zijian Li, Pengjun Xie, Bo Liu, Jiuxin Cao• 2026

Related benchmarks

Task	Dataset	Result
Open-ended writing	WritingBench	Score75.76	20
Medical Question Answering	HealthBench Medical	Score56.36	10
Multi-domain evaluation	Aggregate HealthBench, LLMMed-Eval, WritingBench, Creative Writing, ResearchQA	Macro-average Score70.55	10
Science Question Answering	ResearchQA Science	Score77.31	10
Creative Writing Generation	Creative Writing	Score69.88	10
Medical Question Answering	LLMMed-Eval Medical	Score73.46	10

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord