Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop Question Answering on En.QA
Loading...
1.41
Helmet Correctness Score
Direct Input
0.474
0.717
0.96
1.203
May 27, 2025
Helmet Correctness Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Helmet Correctness Score
Direct Input
Base Model=gpt-4o-mini...
2025.05
1.41
ExtAgents (Ours)
Base Model=gpt-4o-mini...
2025.05
1.2
LLM×MapReduce
Base Model=gpt-4o-mini...
2025.05
1.12
ExtAgents (Ours)
Base Model=Llama-3.1-8...
2025.05
1.09
Direct Input
Base Model=Llama-3.1-8...
2025.05
0.93
LLM×MapReduce
Base Model=Llama-3.1-8...
2025.05
0.78
Direct Input
Base Model=DeepSeek-R1...
2025.05
0.69
Chain of Agents
Base Model=Llama-3.1-8...
2025.05
0.51
Feedback
Search any
task
Search any
task