Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on HotpotQA n=500 (downstream)
Loading...
23
EM
AAR
16.344
18.072
19.8
21.528
Feb 13, 2026
EM
F1
Updated 1mo ago
Evaluation Results
Method
Method
Links
EM
F1
AAR
top-k=5, Reader=Claude...
2026.02
23
39.1
Dense baseline
top-k=5, Reader=Claude...
2026.02
16.6
31.1
Feedback
Search any
task
Search any
task