Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop QA on HotpotQA (test val)
Loading...
59.7
F1 Score
EXTAGENTS
14.148
25.974
37.8
49.626
May 27, 2025
F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 Score
EXTAGENTS
Backbone=gpt-4o-2024-0...
2025.05
59.7
EXTAGENTS
Backbone=gpt-4o-2024-0...
2025.05
55.3
EXTAGENTS
Backbone=gpt-4o-mini-2...
2025.05
53.4
DRAG
Backbone=gpt-4o-mini-2...
2025.05
48.2
IterDRAG
Backbone=gpt-4o-mini-2...
2025.05
41.3
EXTAGENTS
Backbone=Llama-3.1-8B-...
2025.05
41.2
IterDRAG
Backbone=Llama-3.1-8B-...
2025.05
36.8
DRAG
Backbone=Llama-3.1-8B-...
2025.05
34.9
Direct Input
Backbone=Llama-3.1-8B-...
2025.05
25.4
Direct Input
Backbone=gpt-4o-mini-2...
2025.05
20.4
Direct Input
Backbone=DeepSeek-R1-D...
2025.05
15.9
Feedback
Search any
task
Search any
task