Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Opinion Alignment on WoM
Loading...
75.1
Mean Accuracy
SFT+GRPO
25.5336
38.4018
51.27
64.1382
Mar 1, 2026
Mean Accuracy
Updated 3mo ago
Evaluation Results
Method
Method
Links
Mean Accuracy
SFT+GRPO
Base model=Magistral 24B
2026.03
75.1
SFT+GRPO
Base model=Llama 3.1 8B
2026.03
75.06
SFT
Base model=Magistral 24B
2026.03
72.56
GRPO
Base model=Magistral 24B
2026.03
72.05
SFT+GRPO
Base model=Qwen3 8B
2026.03
71.16
SFT
Base model=Llama 3.1 8B
2026.03
68.25
SFT
Base model=Qwen3 8B
2026.03
61.74
GRPO
Base model=Llama 3.1 8B
2026.03
60.63
ORPO
Base model=Llama 3.1 8B
2026.03
57.48
GRPO
Base model=Qwen3 8B
2026.03
53.19
icl
Base model=Qwen3 8B
2026.03
48.67
icl
Base model=Magistral 24B
2026.03
44.64
ORPO
Base model=Qwen3 8B
2026.03
38.56
ORPO
Base model=Magistral 24B
2026.03
36.1
random
Base model=Untrained b...
2026.03
33.33
icl
Base model=Llama 3.1 8B
2026.03
31.44
majority
Base model=Untrained b...
2026.03
27.44
Feedback
Search any
task
Search any
task