Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Modeling on JudgeBench

93.3Accuracy

OpenRS

48.26859.95971.6583.341Jul 2, 2025Aug 17, 2025Oct 3, 2025Nov 19, 2025Jan 4, 2026Feb 20, 2026Apr 8, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2026.02
93.386.8
2026.02
91.689.4
2026.02
89.486.7
2026.02
89.186.2
2026.02
86.484.9
2025.07
83.4-
2026.02
82-
2026.02
8084.3
2025.07
80-
2025.07
77.9-
2025.07
76-
2026.02
74.3-
2025.07
73.5-
2025.07
73.4-
2025.07
72.9-
2026.02
71.6-
2025.07
71.1-
2026.02
70.2-
2026.02
70.271.7
2026.02
70.2-
2025.07
70.2-
70.2-
2026.02
70.1-
2025.07
70-
2025.07
69.5-
69.4-
2025.07
69.2-
2026.02
69.1-
2025.07
67.6-
2026.02
66.5-
2026.02
66.567.3
2026.02
66.5-
2025.07
66.5-
2026.02
66-
2026.02
65.8-
2026.02
65.870.5
2026.02
65.8-
2025.07
65.8-
2026.02
65.2-
2025.07
65.2-
2025.07
65-
2026.02
64.8-
2026.02
64.8-
2025.07
64.8-
64.5-
2026.02
64.3-
2026.02
64.364.8
2026.02
64.3-
2025.07
64.3-
2026.02
64.267.4
2026.02
64.2-
2025.07
64.2-
2025.07
64.1-
2025.07
63.8-
2025.07
63.8-
2025.07
63.8-
63.5-
2026.02
63.565
2026.02
63.5-
2025.07
63.5-
2025.07
63.5-
2025.07
63.3-
2026.02
62.9-
2026.02
62.967.1
2026.02
62.9-
2025.07
62.9-
2025.07
62.6-
2025.07
62.3-
2025.07
62.1-
2025.07
62-
2026.02
61.3-
2026.02
61.1-
2026.02
60-
2026.02
60-
2025.07
60-
2025.07
59.9-
2026.02
59.8-
2025.07
59.8-
2026.02
59.7-
2026.02
59.763.4
2026.02
59.7-
2025.07
59.7-
2025.07
59.4-
2025.07
59-
2025.07
58.4-
2025.07
57.5-
2026.04
56.9-
2026.04
56.8-
2026.02
56.6-
2025.07
56.6-
2026.04
56.1-
2026.04
56-
2026.04
55.4-
2026.04
54.3-
2026.04
52.3-
2026.04
52-
2026.04
51.7-
2026.04
51.1-
2025.07
50.9-
2026.04
50-
Showing 100 of 105 rows