Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop Question Answering on HotpotQA (Helmet correctness score)
Loading...
1.86
Helmet Score
EXTAGENTS
0.924
1.167
1.41
1.653
May 27, 2025
Helmet Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Helmet Score
EXTAGENTS
Base Model=gpt-4o-2024...
2025.05
1.86
Direct Input
Base Model=gpt-4o-mini...
2025.05
1.83
EXTAGENTS (N = 1)
Base Model=gpt-4o-2024...
2025.05
1.73
ExtAgents (Ours)
Base Model=gpt-4o-mini...
2025.05
1.71
IterDRAG
Base Model=gpt-4o-mini...
2025.05
1.7
Direct Input
Base Model=DeepSeek-R1...
2025.05
1.56
DRAG
Base Model=gpt-4o-mini...
2025.05
1.53
ExtAgents (Ours)
Base Model=Llama-3.1-8...
2025.05
1.38
DRAG
Base Model=Llama-3.1-8...
2025.05
1.2
IterDRAG
Base Model=Llama-3.1-8...
2025.05
1.14
Direct Input
Base Model=Llama-3.1-8...
2025.05
0.96
Feedback
Search any
task
Search any
task