| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| DeepTheorem sampled perturbation (test) | Llama-3.1 (RULES) | Exact Match Accuracy90.9 | 12 | 4d ago | |
| IMO-Lemma (IL) (test) | Llama-3.1 (RULES) | Exact Match Accuracy93.6 | 12 | 4d ago | |
| NLPS (test) | Llama-3.1 (GRPO) | Exact Match Accuracy86.5 | 12 | 4d ago | |
| NaturalProofs (test) | Llama-3.1 (RULES) | Exact Match Accuracy93.2 | 12 | 4d ago |