Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Lemma Judging on IMO-Lemma (IL) (test)
Loading...
93.6
Exact Match Accuracy
Llama-3.1 (RULES)
29.432
46.091
62.75
79.409
Feb 1, 2026
Exact Match Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Exact Match Accuracy
Llama-3.1 (RULES)
Optimization=RULES
2026.02
93.6
Qwen2.5 (RULES)
Optimization=RULES
2026.02
92.7
Llama-3.1 (GRPO)
Optimization=GRPO
2026.02
89.8
DeepMath (RULES)
Optimization=RULES
2026.02
89.5
OLMO2 (GRPO)
Optimization=GRPO
2026.02
83.6
OLMO2 (RULES)
Optimization=RULES
2026.02
82.5
Qwen2.5 (GRPO)
Optimization=GRPO
2026.02
80.4
DeepMath (GRPO)
Optimization=GRPO
2026.02
65.8
OLMO2 (vanilla)
Optimization=vanilla
2026.02
64.9
Qwen2.5 (vanilla)
Optimization=vanilla
2026.02
55.8
Llama-3.1 (vanilla)
Optimization=vanilla
2026.02
52.3
DeepMath (vanilla)
Optimization=vanilla
2026.02
31.9
Feedback
Search any
task
Search any
task