Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Abstract Reasoning on ARC 1
Loading...
79.6
Pass@2
Bespoke (Grok-4)
9.296
27.548
45.8
64.052
Nov 21, 2025
Pass@2
Updated 1d ago
Evaluation Results
Method
Method
Links
Pass@2
Bespoke (Grok-4)
Regime=Chain-of-though...
2025.11
79.6
Grok-4-thinking
Regime=Chain-of-though...
2025.11
66.7
DIS
Regime=Small-sample tr...
2025.11
41.3
TRM
Regime=Small-sample tr...
2025.11
40.4
DIS-medium
Regime=Small-sample tr...
2025.11
40
Gemini 2.5 Pro 32K
Regime=Chain-of-though...
2025.11
37
o3-mini-high
Regime=Chain-of-though...
2025.11
34.5
Claude 3.7 16K
Regime=Chain-of-though...
2025.11
28.6
TRM-medium
Regime=Small-sample tr...
2025.11
27.1
DIS-compact
Regime=Small-sample tr...
2025.11
24
Deepseek R1
Regime=Chain-of-though...
2025.11
15.8
TRM-compact
Regime=Small-sample tr...
2025.11
12
Feedback
Search any
task
Search any
task