| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| SAT | F1 Score79 | 16 | 1mo ago | ||
| K&K | Min-K% | F1 Score70 | 16 | 1mo ago | |
| AIME 2025 | Self-Critique | F1 Score76 | 16 | 1mo ago | |
| AIME 2024 | Self-Critique | F1 Score76 | 16 | 1mo ago | |
| DETCON Logical Reasoning | CDD | Accuracy70.6 | 7 | 1mo ago | |
| DETCON Code Generation | CDD | Accuracy71.5 | 7 | 1mo ago | |
| Titanic | - | - | 0 | 19d ago | |
| synthetic | - | - | 0 | 19d ago | |
| mushroom | - | - | 0 | 19d ago | |
| iris | - | - | 0 | 19d ago | |
| gamma | - | - | 0 | 19d ago |