Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM-as-Judge evaluation on HH dataset

59.1WCWR

RMOD

52.54854.24955.9557.651Mar 11, 2025
Updated 3mo ago

Evaluation Results

MethodLinks
2025.03
59.127.73
2025.03
57.98.48
2025.03
57.628.14
2025.03
54.6336.08
2025.03
52.80.5