Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Modeling on RewardBench 2 (Factuality, Math, Safety, Focus, Ties)

71Precise IF Score

Qwen3.5-35B-A3B w/ Hybrid Reward

6.93623.56840.256.832Sep 3, 2025Oct 17, 2025Dec 1, 2025Jan 14, 2026Feb 28, 2026Apr 13, 2026May 28, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.05
7185.179.588.192.282.797.3
2026.05
66.284.184.677.696.798.481.2
2026.05
61.979.575.589.888.180.581.1
2026.05
61.88076.884.490.577.788.6
2026.05
57.5----84.1-
2026.05
57.577.767.485.290.984.180.9
2026.05
54.482.187.272.791.396.890.1
2026.05
53.878.167.784.790.482.889.1
2026.05
45----84.9-
2026.05
43.1----77-
2026.05
42.9----80.8-
2026.05
42.5----79.6-
2026.05
42.3----76.7-
2026.05
41.9----77.2-
2026.05
40.3----73.3-
2026.05
40----86.5-
2026.05
39.772.382.965.287.373.485.4
2026.05
38.8----87-
2026.05
36.9----79.2-
2026.05
34.4----83.6-
2026.05
34.4----82.2-
2026.05
34.466.361.570.984.874.871.5
2026.05
33.1----84.2-
2025.09
333326.939.449.929.319.4
2025.09
30.932.226.939.348.228.219.6
2025.09
30.643.140.442.855.169.220.7
2026.05
30.6----79-
2025.09
30.136.233.238.95133.829.9
2025.09
3035.832.343.250.936.322.1
2025.09
29.445.441.74353.268.936.1
2026.05
27----62.9-
2025.09
25.841.741.542.650.265.124.8
2025.09
25.64442.243.362.166.324.2
2026.05
23.8----84.6-
2026.05
21.9----56.6-
2026.05
20.6----76.2-
2026.05
14.8----50.4-
2026.05
13.8----55.4-
2026.05
13.2----63.4-
2026.05
10----60.4-
2026.05
9.4----29.1-