Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Modeling on LMArena-like Benchmarks PPE Pref ZH, In-House QA, In-House Writing v1.0

82.3PPE Pref ZH Score

OpenRS

41.7452.2762.873.33Feb 15, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
82.374.482.979.9
2026.02
79.66476.373.5
2026.02
78.975.77977.9
2026.02
78.474.579.877.6
2026.02
78.172.777.376
2026.02
76.674.784.678.6
2026.02
76.373.184.377.9
2026.02
75.371.477.974.8
2026.02
72.267.376.371.9
2026.02
71.167.775.971.6
2026.02
69.467.860.366.1
2026.02
68.761.877.368.8
2026.02
65.767.264.566
2026.02
62.261.777.767
2026.02
60.658.979.166.1
2026.02
58.358.373.163.1
2026.02
57.656.166.560
2026.02
43.359.129.750.2