Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop QA on Zh.QA
Loading...
48.2
F1 Score
EXTAGENTS
12.944
22.097
31.25
40.403
May 27, 2025
F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 Score
EXTAGENTS
Backbone=gpt-4o-mini-2...
2025.05
48.2
LLM×MapReduce
Backbone=gpt-4o-mini-2...
2025.05
43.6
EXTAGENTS
Backbone=Llama-3.1-8B-...
2025.05
34.7
LLM×MapReduce
Backbone=Llama-3.1-8B-...
2025.05
34.5
Direct Input
Backbone=Llama-3.1-8B-...
2025.05
31.5
Chain of Agents
Backbone=Llama-3.1-8B-...
2025.05
24.6
Direct Input
Backbone=gpt-4o-mini-2...
2025.05
20.4
Direct Input
Backbone=DeepSeek-R1-D...
2025.05
14.3
Feedback
Search any
task
Search any
task