Share your thoughts, 1 month free Claude Pro on usSee more

Reward Modeling on PersonalRewardBench (test)

65.21Macro Accuracy

P-GenRM-8B

Updated 5mo ago

Evaluation Results

Method	Links
P-GenRM-8B 2026.02		65.21
o3 2026.02		63.33
SynthesizeMe 70B 2026.02		61.51
Fine-tuned BT-70B 2026.02		60.64
Llama3.1-70B 2026.02		58.27
Llama3.1-8B 2026.02		56.24