Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Reasoning on ARC-AGI public evaluation set V2
Loading...
97.9
Accuracy
Confluence Lab
93.116
94.358
95.6
96.842
Apr 9, 2026
Accuracy
Cost ($) per Task
Savings Factor
Updated 9d ago
Evaluation Results
Method
Method
Links
Accuracy
Cost ($) per Task
Savings Factor
Confluence Lab
Strategy=Code-executio...
2026.04
97.9
11.77
-
SQUEEZE EVOLVE
Strategy=Full pipeline...
2026.04
97.5
7.74
3.7
SQUEEZE EVOLVE
Strategy=Single recomb...
2026.04
97.5
5.93
4.9
Imbue
Strategy=Code-executio...
2026.04
95.1
8.71
-
SQUEEZE EVOLVE
Strategy=Single recomb...
2026.04
94.2
5.62
5.1
RSA
Strategy=Full pipeline...
2026.04
93.3
28.85
1
Feedback
Search any
task
Search any
task