Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-hop Question Answering on MuSiQue fullwiki (val)
Loading...
41.4
Exact Match (EM)
SDP
7.288
16.144
25
33.856
May 12, 2026
Exact Match (EM)
F1 Score
Updated 20d ago
Evaluation Results
Method
Method
Links
Exact Match (EM)
F1 Score
SDP
LLM=GPT-4o
2026.05
41.4
51.9
IRCoT
LLM=GPT-3
2026.05
34.2
43.8
PRISM
LLM=GPT-4o
2026.05
31.2
41.8
SETR-CoT & IRI
LLM=GPT-4o
2026.05
12.3
16.9
RankGPT
LLM=GPT-4o
2026.05
9.5
13.5
RankZephyr + CoT
LLM=GPT-4o
2026.05
9.4
13.3
RankZephyr
LLM=GPT-4o
2026.05
8.6
12.8
Feedback
Search any
task
Search any
task