Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Best-of-N evaluation on RewardBench v2

58.69Accuracy

PC2-based LLM-as-a-Judge

55.382856.241457.157.9586May 10, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.05
58.69
2025.05
55.51