Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Model Evaluation on RewardBench 2

88.2Factuality

Scalar RM

25.38441.6925874.308Dec 17, 2024Mar 11, 2025Jun 3, 2025Aug 26, 2025Nov 18, 2025Feb 10, 2026May 5, 2026
Updated 2d ago

Evaluation Results

MethodLinks
2026.05
88.267.883.197.399.283.186.4-
2024.12
87.254.472.791.396.890.182.1-
2024.12
84.648.871.690.79790.680.5-
2024.12
84.247.973.673.893.891.777.5-
2024.12
82.741.974.989.586.283.776.5-
2024.12
82.545.669.490.993.386.778.1-
2024.12
82.257.577.756.278.482.272.4-
81.341.969.988.486.588.376.1-
2024.12
78.537.269.995.895.483.276.7-
74.141.969.996.490.386.276.5-
73.740.370.594.293.282.675.8-
2024.12
73.354.47590.392.167.275.4-
2024.12
71.636.37192.991.38875.2-
65.755.381.190.986.783.477.2-
2026.05
48.839.163.560.866.158.256.1-
2026.05
37.230.65443.554.456.146-
2026.05
373257.142.35955.647.2-
2026.05
33.129.852.248.351.954.845-
2026.05
32.132.634.439.13546.336.6-
2026.05
31.530.85747.949.659.246-
2026.05
27.826.538.12530.29.726.2-
-------74.7
-------75.3
2026.05
-------74.9
2026.05
-------65
2026.05
-------76
2026.05
-------75.9
2026.05
-------75.7
2026.05
-------75.1
2026.05
-------65
2026.05
-------76.1
2026.05
-------76.1
2026.05
-------76