Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
LLM-as-Judge evaluation on HH dataset
Loading...
59.1
WCWR
RMOD
52.548
54.249
55.95
57.651
Mar 11, 2025
WCWR
DKL
Updated 3mo ago
Evaluation Results
Method
Method
Links
WCWR
DKL
RMOD
Judge=gpt-4o, Block Si...
2025.03
59.1
27.73
DISTILL-RMOD
Judge=gpt-4o, Number o...
2025.03
57.9
8.48
CD-UNIFORM
Judge=gpt-4o, Block Si...
2025.03
57.6
28.14
MO-GRPO
Judge=gpt-4o, Number o...
2025.03
54.6
336.08
MO-DPO
Judge=gpt-4o, Number o...
2025.03
52.8
0.5
Feedback
Search any
task
Search any
task