Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Compositional Reasoning on ARC-AGI 2
Loading...
33.6
Accuracy
RCE
11.552
17.276
23
28.724
Feb 17, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
RCE
Model=Qwen-14B
2026.02
33.6
RCE
Model=Llama-3-8B
2026.02
29.8
RCE
Model=Mistral-7B
2026.02
28
DisCO
Model=Mistral-7B
2026.02
19.7
Base
Model=Qwen-14B
2026.02
19.3
GRPO
Model=Mistral-7B
2026.02
18.2
ToT
Model=Mistral-7B
2026.02
17.3
SC
Model=Mistral-7B, Samp...
2026.02
16.8
CoT
Model=Mistral-7B
2026.02
15.1
Base
Model=Llama-3-8B
2026.02
14.1
Base
Model=Mistral-7B
2026.02
12.4
Feedback
Search any
task
Search any
task