Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Personalized LLM Alignment Evaluation on PersonalRewardBench (test)
Loading...
3.354
Mean Score
Llama3.1-8B-Instruct-GRPO
2.938
3.046
3.154
3.262
Feb 12, 2026
Mean Score
Standard Error
95% CI
Updated 4d ago
Evaluation Results
Method
Method
Links
Mean Score
Standard Error
95% CI
Llama3.1-8B-Instruct-GRPO
Model Series=Llama 3.1...
2026.02
3.354
0.0102
-
Llama3.1-8B-Instruct-DPO
Model Series=Llama 3.1...
2026.02
3.316
0.0068
-
Qwen2.5-72B-Instruct
Model Series=Qwen 2.5,...
2026.02
3.214
0.0089
-
Llama3.1-70B-Instruct
Model Series=Llama 3.1...
2026.02
3.156
0.0093
-
Qwen2.5-7B-Instruct
Model Series=Qwen 2.5,...
2026.02
2.97
0.0089
-
Llama3.1-8B-Instruct
Model Series=Llama 3.1...
2026.02
2.954
0.0074
-
Feedback
Search any
task
Search any
task