Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop QA on En.QA
Loading...
38.2
F1
EXTAGENTS
8.56
16.255
23.95
31.645
May 27, 2025
F1
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1
EXTAGENTS
Backbone=gpt-4o-mini-2...
2025.05
38.2
LLM×MapReduce
Backbone=gpt-4o-mini-2...
2025.05
37.4
EXTAGENTS
Backbone=Llama-3.1-8B-...
2025.05
29.1
LLM×MapReduce
Backbone=Llama-3.1-8B-...
2025.05
25.4
Direct Input
Backbone=Llama-3.1-8B-...
2025.05
23.7
Direct Input
Backbone=gpt-4o-mini-2...
2025.05
18.2
Chain of Agents
Backbone=Llama-3.1-8B-...
2025.05
16.8
Direct Input
Backbone=DeepSeek-R1-D...
2025.05
9.7
Feedback
Search any
task
Search any
task