| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ProverQA hard (test) | Error Rate0 | 12 | 2d ago | ||
| ProntoQA (test) | Error Rate2.8 | 12 | 2d ago | ||
| ProofWriter (test) | SFT+ | ExcRate100 | 12 | 2d ago | |
| FOLIO 203 (dev) | Exclusion Rate6.4 | 12 | 2d ago | ||
| ProverQA OOD hard subset 500 records (test) | - | Error Rate- | 0 | 4d ago | |
| ProntoQA OOD 500 records (test) | - | ExcRate- | 0 | 4d ago | |
| ProofWriter 600 records (test) | - | Exc. Rate- | 0 | 4d ago |