Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

StrategyQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Commonsense ReasoningStrategyQA
Accuracy95.7
208
Performance EstimationStrategyQA
MAE0
197
Question AnsweringStrategyQA
Accuracy94.4
123
Commonsense ReasoningStrategyQA (test)
Accuracy83.49
119
Question AnsweringStrategyQA (test)
Task Accuracy96.7
74
Logical ReasoningStrategyQA
Accuracy89
58
ReasoningStrategyQA
Accuracy83.5
52
Multi-hop ReasoningStrategyQA
Accuracy95.6
50
Question AnsweringStrategyQA
Exact Match (EM)85.59
35
Question AnsweringStrategyQA
EM80.1
35
ReasoningStrategyQA (test)
Factuality Acc100
28
Reasoning Question AnsweringStrategyQA
Accuracy80
26
Question AnsweringStrategyQA
Accuracy90.4
26
Multi-hop Question AnsweringStrategyQA (test)
Accuracy77.12
26
Commonsense ReasoningStrategyQA
Accuracy (%)76.13
24
CalibrationStrategyQA
ECE0.285
24
Question AnsweringSTRATEGYQA
Accuracy61.8
24
Knowledge-intensive QAStrategyQA
ACC66.7
24
Question AnsweringStrategyQA
EM89.34
21
Commonsense ReasoningStrategyQA
Accuracy76.84
20
Multi-hop QAStrategyQA (SQA)
Cover-EM76.95
20
Question AnsweringStrategyQA
Exact Match (EM)82
16
Strategy-based Question AnsweringStrategyQA
Verifiability69.11
16
Multiple Choice ClassificationStrategyQA
Accuracy83.4
16
Reasoning quality evaluationSTRATEGYQA
Somers' D0.2735
15
Showing 25 of 62 rows