Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Question Answering on HotpotQA fullwiki (val)
Loading...
58.3
Exact Match (EM)
SDP
33.028
39.589
46.15
52.711
May 12, 2026
Exact Match (EM)
F1 Score
Updated 20d ago
Evaluation Results
Method
Method
Links
Exact Match (EM)
F1 Score
SDP
LLM=GPT-4o
2026.05
58.3
67.2
PRISM
LLM=GPT-4o
2026.05
54.2
67
IRCoT
LLM=GPT-3
2026.05
49.3
60.7
SETR-CoT & IRI
LLM=GPT-4o
2026.05
39.2
40.5
RankZephyr
LLM=GPT-4o
2026.05
34.7
35
RankGPT
LLM=GPT-4o
2026.05
34.6
35.3
RankZephyr + CoT
LLM=GPT-4o
2026.05
34
34.4
Feedback
Search any
task
Search any
task