Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Compositional Reasoning on GPQA
Loading...
48.9
Accuracy
RCE
23.108
29.804
36.5
43.196
Feb 17, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
RCE
Model=Qwen-14B
2026.02
48.9
RCE
Model=Llama-3-8B
2026.02
43.1
RCE
Model=Mistral-7B
2026.02
41.4
Base
Model=Qwen-14B
2026.02
36.7
DisCO
Model=Mistral-7B
2026.02
34.2
GRPO
Model=Mistral-7B
2026.02
32.4
ToT
Model=Mistral-7B
2026.02
31
SC
Model=Mistral-7B, Samp...
2026.02
30.3
CoT
Model=Mistral-7B
2026.02
28.5
Base
Model=Llama-3-8B
2026.02
27.3
Base
Model=Mistral-7B
2026.02
24.1
Feedback
Search any
task
Search any
task