Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop Question Answering on HotpotQA (Pass@1)
Loading...
60.8
Pass@1
AFLOW
37.712
43.706
49.7
55.694
Apr 13, 2026
Pass@1
Updated 5d ago
Evaluation Results
Method
Method
Links
Pass@1
AFLOW
Backbone=GPT-5-Chat
2026.04
60.8
Mem2Evolve
Backbone=GPT-5-Chat
2026.04
60.8
Alita
Backbone=GPT-5-Chat
2026.04
58.8
SwarmAgentic
Backbone=GPT-5-Chat
2026.04
56
DSPy
Backbone=GPT-5-Chat
2026.04
55.6
EvoAgent
Backbone=GPT-5-Chat
2026.04
54.4
AutoAgents
Backbone=GPT-5-Chat
2026.04
54.2
DyLAN
Backbone=GPT-5-Chat
2026.04
52
GPT-5-Chat (Direct)
Backbone=GPT-5-Chat
2026.04
50.4
GPT-5-Chat (CoT)
Backbone=GPT-5-Chat
2026.04
47.4
GPT-5-Chat (ReAct)
Backbone=GPT-5-Chat
2026.04
41.4
AgentVerse
Backbone=GPT-5-Chat
2026.04
38.6
Feedback
Search any
task
Search any
task