| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Abstract Visual Reasoning | ARC-AGI 2 | Accuracy (Pass@2)100 | 33 | |
| Reasoning | ARC-AGI 2 (test) | Pass@2 Exact Match Accuracy24.9 | 28 | |
| Abstract Visual Reasoning | ARC-AGI 1 | Accuracy (Pass@2)98 | 27 | |
| Abstract reasoning | ARC-AGI 2 | Pass@229.4 | 16 | |
| Abstraction and Reasoning | ARC-AGI 2 (public evaluation) | Pass@2100 | 13 | |
| Abstract Reasoning | ARC-AGI v1 (test) | Accuracy98 | 12 | |
| Abstract Reasoning | ARC-AGI v2 (test) | Accuracy100 | 11 | |
| Compositional Reasoning | ARC-AGI 2 | Accuracy33.6 | 11 | |
| Abstraction and Reasoning | ARC-AGI Public Training Set (Easy) (60 tasks) | Total Cost0.41 | 10 | |
| Reasoning | ARC-AGI 2 | Accuracy50 | 9 | |
| Abstraction and Reasoning | ARC-AGI | ARC-1 Score58.2 | 9 | |
| Fingerprint Matching | ARC-AGI 1 | FMR51 | 7 | |
| Reasoning | ARC-AGI public evaluation set V2 | Accuracy97.9 | 6 | |
| Symbolic Reasoning | ARC-AGI 1 (test) | Pass@247.5 | 6 | |
| ARC-AGI | ARC-AGI (test) | Accuracy (ARC-AGI Test)67 | 5 | |
| Abstract Reasoning | ARC-AGI-3 25 Public Games v2 | RHAE0 | 4 | |
| Abstract Reasoning | ARC-AGI | Frugality Index (Fp)3.54 | 4 | |
| Abstract and compositional reasoning | ARC-AGI 2 (test) | Accuracy (ARC-AGI 2 Test)51 | 4 | |
| Puzzle Solving | ARC-AGI 3 | Levels Won2 | 3 | |
| Symbolic Reasoning | ARC-AGI 2 (test) | Pass@19.9 | 3 | |
| Exploration | ARC-AGI-3 | TU93 Level4 | 2 | |
| Abstract Reasoning | ARC-AGI (concept evaluation) | Accuracy86.8 | 2 |