Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Premise Selection on Isabelle (IS) (test)
Loading...
86.3
Exact Match Accuracy
Llama-3.1 (RULES)
50.212
59.581
68.95
78.319
Feb 1, 2026
Exact Match Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Exact Match Accuracy
Llama-3.1 (RULES)
Optimization=RULES
2026.02
86.3
Llama-3.1 (GRPO)
Optimization=GRPO
2026.02
85.9
Qwen2.5 (RULES)
Optimization=RULES
2026.02
81.3
OLMO2 (RULES)
Optimization=RULES
2026.02
76.6
DeepMath (RULES)
Optimization=RULES
2026.02
75.8
Qwen2.5 (GRPO)
Optimization=GRPO
2026.02
74.2
OLMO2 (GRPO)
Optimization=GRPO
2026.02
72.2
DeepMath (GRPO)
Optimization=GRPO
2026.02
67.6
Qwen2.5 (vanilla)
Optimization=vanilla
2026.02
55.5
Llama-3.1 (vanilla)
Optimization=vanilla
2026.02
54.7
OLMO2 (vanilla)
Optimization=vanilla
2026.02
53.9
DeepMath (vanilla)
Optimization=vanilla
2026.02
51.6
Feedback
Search any
task
Search any
task