Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StrategyQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningStrategyQA
Accuracy95.7
174
Question AnsweringStrategyQA
Accuracy94.4
114
Commonsense ReasoningStrategyQA (test)
Accuracy83.49
81
Logical ReasoningStrategyQA
Accuracy89
58
ReasoningStrategyQA
Accuracy83.5
40
Multi-hop ReasoningStrategyQA
Accuracy95.6
36
Question AnsweringStrategyQA
Exact Match (EM)85.59
35
Question AnsweringStrategyQA
EM80.1
35
ReasoningStrategyQA (test)
Factuality Acc100
28
Question AnsweringStrategyQA (test)
Task Accuracy83
28
Question AnsweringStrategyQA
Accuracy90.4
26
Multi-hop Question AnsweringStrategyQA (test)
Accuracy77.12
26
CalibrationStrategyQA
ECE0.285
24
Question AnsweringSTRATEGYQA
Accuracy61.8
24
Knowledge-intensive QAStrategyQA
ACC66.7
24
Question AnsweringStrategyQA
EM89.34
21
Multi-hop QAStrategyQA (SQA)
Cover-EM76.95
20
Question AnsweringStrategyQA
Exact Match (EM)82
16
Strategy-based Question AnsweringStrategyQA
Verifiability69.11
16
Multiple Choice ClassificationStrategyQA
Accuracy83.4
16
Reasoning quality evaluationSTRATEGYQA
Somers' D0.2735
15
Reasoning Question AnsweringStrategyQA
Accuracy80
14
Strategic Question AnsweringStrategyQA
Reusability Score51.57
12
Question AnsweringStrategyQA
Prefilling Speedup Ratio3.39
12
Follow-up Questioning ConsistencyStrategyQA (unseen)
Average Success Count (M.)42.65
12
Showing 25 of 46 rows