Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Modeling on RewardBench v2

92.1Accuracy

DeepSeek-V3.2

13.0152833.5468954.078574.61011Jul 2, 2025Aug 11, 2025Sep 21, 2025Oct 31, 2025Dec 11, 2025Jan 20, 2026Mar 2, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
92.1-
91.1-
2026.02
90.789.4
2025.07
86.5-
2026.02
85.886.7
2026.02
85.786.8
2026.02
85.686.2
2026.02
84.184.3
2025.07
84.1-
2026.02
83.784.9
2026.03
79.7-
2025.07
78.2-
2026.03
77.5-
2026.02
76.770.5
2025.07
76.7-
2025.07
76.7-
2026.02
76.571.7
2025.07
76.5-
2025.07
75.8-
2025.07
75.5-
2026.02
75.367.3
2025.07
75.3-
2025.07
74.7-
2026.03
74-
2025.07
73.9-
2026.03
73.4-
2025.07
73.1-
2026.02
72.567.4
2025.07
72.5-
2026.03
71.9-
2026.02
71.867.1
2025.07
71.8-
2025.07
70.7-
2025.07
70.7-
2026.02
68.847-
2025.07
68.3-
2026.03
67.8-
2025.07
67.7-
2026.02
66.563.4
2025.07
66.5-
2025.07
64.9-
2026.02
64.865
2025.07
64.8-
2026.03
64.7-
2025.07
64.7-
2025.07
64.3-
2025.07
62.9-
2026.03
61.4-
2025.07
61.3-
2025.07
59.7-
2026.02
59.601-
2025.07
59.6-
2025.07
58.9-
2025.07
58.1-
2026.02
58.057-
2025.07
57.4-
2026.02
57.363-
2026.02
57-
2026.02
56.364.8
2025.07
56.3-
2026.03
56-
2026.03
55.6-
2025.07
53.4-
2026.02
53.348-
2026.03
48.7-
2026.02
48.255-
2025.07
45.5-
2025.07
39-
2026.02
33.322-
2026.02
33.261-
2026.02
25.435-
2026.02
16.057-
2026.02
-75.51
2026.02
-74.665
2026.02
-73.793
2026.02
-68.176
2026.02
-64.376
2026.02
-61.25
2026.02
-59.661
2026.02
-46.2
2026.02
-36.2
2026.02
-30.566
2026.02
-26.479
2026.02
-24.98