Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Lemma Judging on NLPS (test)
Loading...
86.5
Exact Match Accuracy
Llama-3.1 (GRPO)
50.1
59.55
69
78.45
Feb 1, 2026
Exact Match Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Exact Match Accuracy
Llama-3.1 (GRPO)
Optimization=GRPO
2026.02
86.5
Llama-3.1 (RULES)
Optimization=RULES
2026.02
86.4
DeepMath (RULES)
Optimization=RULES
2026.02
86.2
Qwen2.5 (RULES)
Optimization=RULES
2026.02
85.8
OLMO2 (GRPO)
Optimization=GRPO
2026.02
84.2
Qwen2.5 (GRPO)
Optimization=GRPO
2026.02
83.7
OLMO2 (RULES)
Optimization=RULES
2026.02
83.4
DeepMath (GRPO)
Optimization=GRPO
2026.02
77.2
Llama-3.1 (vanilla)
Optimization=vanilla
2026.02
75.1
Qwen2.5 (vanilla)
Optimization=vanilla
2026.02
71.9
OLMO2 (vanilla)
Optimization=vanilla
2026.02
69.1
DeepMath (vanilla)
Optimization=vanilla
2026.02
51.5
Feedback
Search any
task
Search any
task