Share your thoughts, 1 month free Claude Pro on usSee more

Pluralistic Reward Model Learning on ARENA

60.56Accuracy (ARENA)

PAL

Updated 3mo ago

Evaluation Results

Method	Links
PAL 2026.03		60.56
EpiPersona 2026.03		59.57
EpiPersona 2026.03		57.45
GPO 2026.03		56.69
VPL 2026.03		56.69
PAL 2026.03		56.69
VPL 2026.03		54.93
BT 2026.03		54.06
GPO 2026.03		53.87
BT 2026.03		51.94