Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Question Answering on HotpotQA (Exact Match and Token Count)
Loading...
89
Exact Match Accuracy
Clean Baseline
82.76
84.38
86
87.62
Dec 16, 2025
Exact Match Accuracy
Token Count
Updated 4d ago
Evaluation Results
Method
Method
Links
Exact Match Accuracy
Token Count
Clean Baseline
Backbone=gpt-4.1, Cond...
2025.12
89
15,200
Meta-Paralysis
Backbone=gpt-4.1, Cond...
2025.12
89
28,500
GSI-Paralysis
Backbone=gpt-4.1, Cond...
2025.12
88
32,400
Clean Baseline
Backbone=gpt-4.1, Cond...
2025.12
87
4,500
Meta-Paralysis
Backbone=gpt-4.1, Cond...
2025.12
86
8,200
GSI-Paralysis
Backbone=gpt-4.1, Cond...
2025.12
85
13,100
Clean Baseline
Backbone=gpt-4.1, Cond...
2025.12
84
2,100
Meta-Paralysis
Backbone=gpt-4.1, Cond...
2025.12
84
3,900
GSI-Paralysis
Backbone=gpt-4.1, Cond...
2025.12
83
4,300
Feedback
Search any
task
Search any
task