Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Commonsense Multi-hop QA on StrategyQA
Loading...
8.1
Full Length Coverage
Minimal-core extraction
7.372
7.561
7.75
7.939
May 14, 2026
Full Length Coverage
Core Length Coverage
CR
RM
Top-3 Mass
Retention
Updated 19d ago
Evaluation Results
Method
Method
Links
Full Length Coverage
Core Length Coverage
CR
RM
Top-3 Mass
Retention
Minimal-core extraction
Model=GPT-5
2026.05
8.1
3.2
40
60
80
95
Minimal-core extraction
Model=DeepSeek-R1-Dist...
2026.05
7.8
3.4
44
56
76
93
Minimal-core extraction
Model=Qwen3-32B
2026.05
7.5
3.5
47
53
73
91
Minimal-core extraction
Model=DeepSeek-R1-Dist...
2026.05
7.4
3.7
50
50
69
88
Feedback
Search any
task
Search any
task