Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

StrategyQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningStrategyQA
Accuracy90.4
125
Question AnsweringStrategyQA
Accuracy94.4
114
Commonsense ReasoningStrategyQA (test)
Accuracy83.49
81
Logical ReasoningStrategyQA
Accuracy89
58
Question AnsweringStrategyQA
EM80.1
35
Multi-hop ReasoningStrategyQA
Accuracy95.6
32
ReasoningStrategyQA (test)
Factuality Acc100
28
Question AnsweringStrategyQA (test)
Task Accuracy83
28
Multi-hop Question AnsweringStrategyQA (test)
Accuracy77.12
26
CalibrationStrategyQA
ECE0.285
24
Question AnsweringSTRATEGYQA
Accuracy61.8
24
Knowledge-intensive QAStrategyQA
ACC66.7
24
Question AnsweringStrategyQA
EM89.34
21
Multi-hop QAStrategyQA (SQA)
Cover-EM76.95
20
Strategy-based Question AnsweringStrategyQA
Verifiability69.11
16
Multiple Choice ClassificationStrategyQA
Accuracy83.4
16
Question AnsweringStrategyQA
Accuracy84
14
Strategic Question AnsweringStrategyQA
Reusability Score51.57
12
Question AnsweringStrategyQA
Prefilling Speedup Ratio3.39
12
Follow-up Questioning ConsistencyStrategyQA (unseen)
Average Success Count (M.)42.65
12
ReasoningStrategyQA
Accuracy83.5
10
Retrieval-Augmented Generation EvaluationStrategyQA 100-query benchmark
Mean Score69.8
10
Question AnsweringStrategyQA
Precision65.9
9
Question AnsweringStrategyQA
ECE0.217
8
Question AnsweringStrategyQA
Factuality59.6
8
Showing 25 of 36 rows