Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Opinion Alignment on Wahl-O-Mat (WoM) March 2025 (test)
Loading...
53.21
Mean Macro-F1
SFT+GRPO
23.5908
31.2804
38.97
46.6596
Mar 1, 2026
Mean Macro-F1
Updated 3mo ago
Evaluation Results
Method
Method
Links
Mean Macro-F1
SFT+GRPO
Base model=Magistral 24B
2026.03
53.21
SFT+GRPO
Base model=Llama 3.1 8B
2026.03
52.53
SFT
Base model=Magistral 24B
2026.03
51.86
GRPO
Base model=Magistral 24B
2026.03
51
SFT+GRPO
Base model=Qwen3 8B
2026.03
49.38
SFT
Base model=Llama 3.1 8B
2026.03
48.95
ORPO
Base model=Llama 3.1 8B
2026.03
43.29
SFT
Base model=Qwen3 8B
2026.03
42.91
GRPO
Base model=Llama 3.1 8B
2026.03
37.29
random
Base model=Untrained b...
2026.03
33.33
GRPO
Base model=Qwen3 8B
2026.03
31.42
icl
Base model=Llama 3.1 8B
2026.03
28.17
majority
Base model=Untrained b...
2026.03
27.44
icl
Base model=Qwen3 8B
2026.03
26.19
icl
Base model=Magistral 24B
2026.03
26.19
ORPO
Base model=Qwen3 8B
2026.03
25.25
ORPO
Base model=Magistral 24B
2026.03
24.73
Feedback
Search any
task
Search any
task