Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-hop Commonsense Reasoning on StrategyQA sampled 50 (val)
Loading...
90
Accuracy
CoS
84.8
86.15
87.5
88.85
Feb 2, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
CoS
mode=hybrid
2026.02
90
Direct
temperature=0
2026.02
87.5
Chain-of-Thought
temperature=0
2026.02
87
Structured CoT
temperature=0
2026.02
85
Self-Consistency
temperature=0.7, k=5
2026.02
85
Feedback
Search any
task
Search any
task