Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop Question Answering on HotpotQA (Score)
Loading...
38.9
HotpotQA Score
π-Play
10.092
17.571
25.05
32.529
Apr 15, 2026
HotpotQA Score
Updated 2d ago
Evaluation Results
Method
Method
Links
HotpotQA Score
π-Play
Base Model=Qwen3-8B
2026.04
38.9
π-Play
Base Model=Qwen3-4B-In...
2026.04
38.5
Dr.Zero
Base Model=Qwen3-4B-In...
2026.04
38.4
Dr.Zero
Base Model=Qwen3-8B
2026.04
36.6
Search-R1
Base Model=Qwen3-4B-In...
2026.04
34.9
SQLM*
Base Model=Qwen3-4B-In...
2026.04
34.1
SQLM*
Base Model=Qwen3-8B
2026.04
33.1
SQLM*
Base Model=Qwen3-4B
2026.04
33
π-Play
Base Model=Qwen3-4B
2026.04
32.3
Search-R1
Base Model=Qwen3-8B
2026.04
32
Dr.Zero
Base Model=Qwen3-4B
2026.04
31
Search-R1
Base Model=Qwen3-4B
2026.04
30.7
ReAct
Base Model=Qwen3-4B-In...
2026.04
22.8
ReAct
Base Model=Qwen3-8B
2026.04
22.3
ReAct
Base Model=Qwen3-4B
2026.04
11.2
Feedback
Search any
task
Search any
task