Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Compositional Reasoning on HLE
Loading...
23.1
Accuracy
RCE
7.604
11.627
15.65
19.673
Feb 17, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
RCE
Model=Qwen-14B
2026.02
23.1
RCE
Model=Llama-3-8B
2026.02
20.2
RCE
Model=Mistral-7B
2026.02
18.7
Base
Model=Qwen-14B
2026.02
14.3
DisCO
Model=Mistral-7B
2026.02
13.8
GRPO
Model=Mistral-7B
2026.02
12.6
ToT
Model=Mistral-7B
2026.02
11.9
SC
Model=Mistral-7B, Samp...
2026.02
11.4
CoT
Model=Mistral-7B
2026.02
10.1
Base
Model=Llama-3-8B
2026.02
9.6
Base
Model=Mistral-7B
2026.02
8.2
Feedback
Search any
task
Search any
task