Share your thoughts, 1 month free Claude Pro on usSee more

Multi-domain Evaluation on HealthBench, LLMMed-Eval, WritingBench, Creative Writing, and ResearchQA

70.55Macro-average Score

EvoRubric

Updated 1mo ago

Evaluation Results

Method	Links
EvoRubric 2026.05		70.55
External Evolving-RL 2026.05		69.65
Static Rubric-RL 2026.05		69.07
Gemini-2.5-pro 2026.05		68.96
EvoRubric 2026.05		68.84
Static Rubric-RL 2026.05		66.31
External Evolving-RL 2026.05		66.16
Base Model 2026.05		65.22
GPT-4o 2026.05		64.32
Base Model 2026.05		62.19