Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Modeling on RewardBench 2

93.4L-Acc

GPT-5-chat

82.27285.16188.0590.939Feb 6, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
93.410.383.7--
2026.02
93.15.188.3--
2026.02
92.63.489.4--
2026.02
921.790.4--
2026.02
9214.678.5--
2026.02
91.918.974.5--
2026.02
91.69.483--
2026.02
90.616.275.9--
2026.02
90.213.977.7--
2026.02
90.14.885.7--
2026.02
89.821.770.3--
2026.02
89.225.966.1--
2026.02
88.536.756--
2026.02
88.12962.5--
2026.02
87.920.170.2--
2026.02
87.819.370.9--
2026.02
87.118.371.2--
2026.02
85.327.461.9--
2026.02
84.22563.2--
2026.02
82.724.662.4--
---76.5-
2025.07
---67.2-
---57.4-
2025.07
---66.6-
---59.7-
2025.07
---78.1-
2025.07
---69.8-
2025.07
---71-
2026.02
----70.1
2026.02
----69.3
2026.02
----69.3
2026.02
----70.1
2026.02
----69.3