Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on HotpotQA (Exact Match and Token Count)
Loading...
89
Exact Match Accuracy
Clean Baseline
82.76
84.38
86
87.62
Dec 16, 2025
Exact Match Accuracy
Token Count
Updated 1mo ago
Evaluation Results
Method
Method
Links
Exact Match Accuracy
Token Count
Clean Baseline
Backbone=gpt-4.1, Cond...
2025.12
89
15,200
Meta-Paralysis
Backbone=gpt-4.1, Cond...
2025.12
89
28,500
GSI-Paralysis
Backbone=gpt-4.1, Cond...
2025.12
88
32,400
Clean Baseline
Backbone=gpt-4.1, Cond...
2025.12
87
4,500
Meta-Paralysis
Backbone=gpt-4.1, Cond...
2025.12
86
8,200
GSI-Paralysis
Backbone=gpt-4.1, Cond...
2025.12
85
13,100
Clean Baseline
Backbone=gpt-4.1, Cond...
2025.12
84
2,100
Meta-Paralysis
Backbone=gpt-4.1, Cond...
2025.12
84
3,900
GSI-Paralysis
Backbone=gpt-4.1, Cond...
2025.12
83
4,300
Feedback
Search any
task
Search any
task