Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Lemma Judging on NaturalProofs (test)
Loading...
93.2
Exact Match Accuracy
Llama-3.1 (RULES)
59.504
68.252
77
85.748
Feb 1, 2026
Exact Match Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Exact Match Accuracy
Llama-3.1 (RULES)
Model=Llama-3.1, Optim...
2026.02
93.2
Llama-3.1 (GRPO)
Model=Llama-3.1, Optim...
2026.02
88.6
DeepMath (RULES)
Model=DeepMath, Optimi...
2026.02
88.5
Qwen2.5 (RULES)
Model=Qwen2.5, Optimiz...
2026.02
88.2
OLMO2 (GRPO)
Model=OLMO2, Optimizat...
2026.02
85.2
OLMO2 (RULES)
Model=OLMO2, Optimizat...
2026.02
84.9
OLMO2 (vanilla)
Model=OLMO2, Optimizat...
2026.02
77.6
DeepMath (GRPO)
Model=DeepMath, Optimi...
2026.02
77.5
Qwen2.5 (GRPO)
Model=Qwen2.5, Optimiz...
2026.02
76.5
Qwen2.5 (vanilla)
Model=Qwen2.5, Optimiz...
2026.02
63.9
Llama-3.1 (vanilla)
Model=Llama-3.1, Optim...
2026.02
61.3
DeepMath (vanilla)
Model=DeepMath, Optimi...
2026.02
60.8
Feedback
Search any
task
Search any
task