Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Modeling on RM-Bench Chat

78.5Accuracy

Gemini-2.5-Flash

36.17247.16158.1569.139Feb 2, 2026Feb 21, 2026Mar 12, 2026Mar 31, 2026Apr 19, 2026May 8, 2026May 27, 2026
Updated 5d ago

Evaluation Results

MethodLinks
78.5
2026.05
78.5
2026.02
75.3
2026.05
75.3
2026.02
74.2
2026.05
74.2
2026.02
73.9
2026.05
73.9
2026.02
73.2
2026.05
73.2
2026.02
71.4
2026.05
71.4
2026.02
69.2
2026.02
69.1
2026.05
68.6
2026.02
67.9
2026.05
67.9
2026.05
67.8
2026.02
67.2
2026.05
67.2
2026.02
67
2026.05
67
2026.05
66.9
2026.02
65.7
2026.05
65.7
2026.05
64.8
2026.05
64.8
2026.05
64.3
2026.02
64.2
2026.02
64.2
2026.05
64.2
2026.05
64.2
62.5
2026.05
62.5
2026.02
62.2
2026.05
62.2
2026.05
61.2
2026.02
59.9
2026.05
59.9
2026.02
55.4
2026.05
55.4
2026.05
37.8