Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop Question Answering on 2WikiMultihopQA (Pass@1)
Loading...
82
Pass@1
Mem2Evolve
47.056
56.128
65.2
74.272
Apr 13, 2026
Pass@1
Updated 5d ago
Evaluation Results
Method
Method
Links
Pass@1
Mem2Evolve
Backbone=GPT-5-Chat
2026.04
82
GPT-5-Chat (Direct)
Backbone=GPT-5-Chat
2026.04
81.8
SwarmAgentic
Backbone=GPT-5-Chat
2026.04
80
Alita
Backbone=GPT-5-Chat
2026.04
77.4
DSPy
Backbone=GPT-5-Chat
2026.04
76.4
EvoAgent
Backbone=GPT-5-Chat
2026.04
75
AgentVerse
Backbone=GPT-5-Chat
2026.04
74.6
GPT-5-Chat (CoT)
Backbone=GPT-5-Chat
2026.04
74.4
AutoAgents
Backbone=GPT-5-Chat
2026.04
73.8
AFLOW
Backbone=GPT-5-Chat
2026.04
72.4
DyLAN
Backbone=GPT-5-Chat
2026.04
65
GPT-5-Chat (ReAct)
Backbone=GPT-5-Chat
2026.04
48.4
Feedback
Search any
task
Search any
task