Share your thoughts, 1 month free Claude Pro on usSee more

LLM-as-Judge evaluation on HH dataset

59.1WCWR

RMOD

Updated 4mo ago

Evaluation Results

Method	Links
RMOD 2025.03		59.1	27.73
DISTILL-RMOD 2025.03		57.9	8.48
CD-UNIFORM 2025.03		57.6	28.14
MO-GRPO 2025.03		54.6	336.08
MO-DPO 2025.03		52.8	0.5