Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Abstraction and Reasoning on ARC-AGI Public Training Set (Easy) (60 tasks)
Loading...
0.41
Total Cost
Two-step agent
-3.4128
22.3911
48.195
73.9989
Dec 3, 2025
Total Cost
Accuracy (%)
Updated 4d ago
Evaluation Results
Method
Method
Links
Total Cost
Accuracy (%)
Two-step agent
Base model=GPT-3.5
2025.12
0.41
-
ADAS best agent (reproduced)
Base model=GPT-3.5
2025.12
2.11
-
Two-step agent
Base model=GPT-4o
2025.12
2.85
-
ENCOMPASS (+ global best-of-N, N = 8)
Base model=GPT-3.5, N=...
2025.12
3.29
-
ENCOMPASS (+ global best-of-N, N = 36)
Base model=GPT-3.5, N=...
2025.12
14.81
-
ENCOMPASS (+ BFS)
Base model=GPT-3.5, br...
2025.12
15.81
-
ENCOMPASS (+ global best-of-N, N = 8)
Base model=GPT-4o, N=8...
2025.12
22.76
-
ADAS best agent (reproduced)
Base model=GPT-4o
2025.12
27.85
-
ENCOMPASS (+ BFS)
Base model=GPT-4o, bra...
2025.12
88.69
-
ENCOMPASS (+ global best-of-N, N = 36)
Base model=GPT-4o, N=3...
2025.12
95.98
-
Feedback
Search any
task
Search any
task