Reward Modeling on Anthropic Helpful-Harmless (HHH)

0.7108RewardBench Total

Full Set

Updated 2mo ago

Evaluation Results

Method	Links
Full Set 2025.08		0.7108
Difficulty-Based Preference Data Selection 2025.08		0.7052
Random 2025.08		0.6798