Share your thoughts, 1 month free Claude Pro on usSee more

Reward Modeling Evaluation on RewardBench

88.1R-Bench Score

SFT + TTL + DPO + TPO

Updated 2mo ago

Evaluation Results

Method	Links
SFT + TTL + DPO + TPO 2026.05		88.1
SFT + TTL + DPO 2026.05		86.8
SFT (no TTL) + DPO 2026.05		84.5