Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Pluralistic Reward Model Learning on ARENA
Loading...
60.56
Accuracy (ARENA)
PAL
51.5952
53.9226
56.25
58.5774
Mar 30, 2026
Accuracy (ARENA)
Updated 18d ago
Evaluation Results
Method
Method
Links
Accuracy (ARENA)
PAL
Backbone=LLAMA-3.2-3B
2026.03
60.56
EpiPersona
Backbone=LLAMA-3.2-3B
2026.03
59.57
EpiPersona
Backbone=LLAMA-3.1-8B
2026.03
57.45
GPO
Backbone=LLAMA-3.1-8B
2026.03
56.69
VPL
Backbone=LLAMA-3.2-3B
2026.03
56.69
PAL
Backbone=LLAMA-3.1-8B
2026.03
56.69
VPL
Backbone=LLAMA-3.1-8B
2026.03
54.93
BT
Backbone=LLAMA-3.1-8B
2026.03
54.06
GPO
Backbone=LLAMA-3.2-3B
2026.03
53.87
BT
Backbone=LLAMA-3.2-3B
2026.03
51.94
Feedback
Search any
task
Search any
task