| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Automated Theorem Proving | MiniF2F (test) | Success Rate99.6 | 93 | |
| Theorem Proving | MiniF2F (val) | Success Rate63.9 | 59 | |
| Formal Theorem Proving | miniF2F Isabelle (val) | Success Rate57 | 41 | |
| Formal Theorem Proving | miniF2F Isabelle (test) | Success Rate51.2 | 39 | |
| Formal Theorem Proving | miniF2F rw (test) | Pass@875 | 24 | |
| Formal Theorem Proving | miniF2F rw (val) | Pass@881.1 | 24 | |
| Theorem Proving | miniF2F Lean (test) | Pass@6452 | 24 | |
| Automated Theorem Proving | miniF2F | Accuracy31.97 | 18 | |
| Autoformalization | miniF2F (test) | TC@196 | 16 | |
| Formal Theorem Proving | miniF2F (val) | Pass@142.2 | 15 | |
| Auto-formalization | MiniF2F (test) | Pass@8100 | 13 | |
| Formal Theorem Proving | miniF2F | Proof Success Rate66.31 | 12 | |
| Formal Theorem Proving | miniF2F | Average Token Cost228.64 | 12 | |
| Informal-to-formal proving | miniF2F (val) | Proven Theorems Rate25.8 | 11 | |
| Theorem Proving | miniF2F Lean (val) | Cumulative Pass Rate60.2 | 10 | |
| Lean theorem proving | MINIF2F 244 problems | Pass@884.02 | 9 | |
| Informal-to-Formal Proving | miniF2F (test) | Accuracy24.6 | 6 | |
| Automated Theorem Proving | miniF2F Easy Mode (test) | Solved Problems (Pass@32)215 | 5 | |
| Theorem Proving | miniF2F Lean (curriculum) | Pass@6432.1 | 3 | |
| Automated Theorem Proving | miniF2F Hard Mode (test) | Total Solved (Pass@32)204 | 2 | |
| Theorem Autoformalization | F2F mini | Objects Score3.14 | 1 |