Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Best-of-N evaluation on RMB

59.69Accuracy

PC2-based LLM-as-a-Judge

39.919645.052350.18555.3177May 10, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.05
59.69
2025.05
40.68