Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Modeling on LitBench

80.7Accuracy

AC-GenRM

46.3855.2964.273.11Jan 12, 2026Jan 26, 2026Feb 9, 2026Feb 23, 2026Mar 9, 2026Mar 23, 2026Apr 7, 2026
Updated 10d ago

Evaluation Results

MethodLinks
2026.04
80.7
2026.04
79.6
2026.04
77.6
2026.01
76.31
2026.01
76.17
2026.01
74.92
2026.04
73.1
2026.04
72.8
2026.01
72.02
2026.01
71.81
2026.01
71.57
2026.01
71.41
2026.04
71
2026.04
70.2
2026.01
70
2026.04
70
2026.01
69.34
2026.04
68.8
2026.01
68.59
2026.01
68.35
2026.04
67.5
2026.01
67.46
2026.04
66.6
2026.01
65.93
2026.01
65.83
2026.01
65.36
2026.01
63.19
2026.01
63.02
2026.04
63
2026.01
62.06
2026.01
58.22
2026.01
57.5
2026.04
54.3
2026.01
54.18
2026.01
53.83
2026.04
47.7