Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Personalized Reward Modeling on Lamp-QA (OOD)
Loading...
60
Arts Score
Qwen3-235B-A22B
48.144
51.222
54.3
57.378
Feb 12, 2026
Arts Score
Personalization Score
Social Score
Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Arts Score
Personalization Score
Social Score
Average Score
Qwen3-235B-A22B
Model Size=235B-A22B
2026.02
60
65.7
60
61.9
Qwen3-32B
Model Size=32B
2026.02
54.3
60
54.3
56.2
LLaMA3.1-70B
Model Size=70B
2026.02
54.3
65.7
60
60
P-GenRM
Model Size=8B, Inferen...
2026.02
54.3
71.4
65.7
63.8
Qwen3-8B
Model Size=8B
2026.02
48.6
54.3
60
54.3
LLaMA3.1-8B
Model Size=8B
2026.02
48.6
54.3
54.3
52.4
SynthMe-8B
Model Size=8B
2026.02
48.6
65.7
60
58.1
Feedback
Search any
task
Search any
task