Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reward Modeling Evaluation on RewardBench

88.1R-Bench Score

SFT + TTL + DPO + TPO

84.35685.32886.387.272May 8, 2026
Updated 23d ago

Evaluation Results

MethodLinks
2026.05
88.1
2026.05
86.8
2026.05
84.5