Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reward Modeling Evaluation on RewardBench
Loading...
88.1
R-Bench Score
SFT + TTL + DPO + TPO
84.356
85.328
86.3
87.272
May 8, 2026
R-Bench Score
Updated 23d ago
Evaluation Results
Method
Method
Links
R-Bench Score
SFT + TTL + DPO + TPO
Initialization=SFT + T...
2026.05
88.1
SFT + TTL + DPO
Initialization=SFT + T...
2026.05
86.8
SFT (no TTL) + DPO
Initialization=SFT (no...
2026.05
84.5
Feedback
Search any
task
Search any
task