Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop Question Answering on Bamboogle (Score)
Loading...
50
Score
Search-R1
7.152
18.276
29.4
40.524
Apr 15, 2026
Score
Updated 2d ago
Evaluation Results
Method
Method
Links
Score
Search-R1
Base Model=Qwen3-4B-In...
2026.04
50
ToolForge
Base Model=Qwen3-8B
2026.04
48
π-Play
Base Model=Qwen3-4B-In...
2026.04
44
Dr.Zero
Base Model=Qwen3-8B
2026.04
40
π-Play
Base Model=Qwen3-8B
2026.04
40
Search-R1
Base Model=Qwen3-4B
2026.04
37.6
Search-R1
Base Model=Qwen3-8B
2026.04
37.6
Dr.Zero
Base Model=Qwen3-4B-In...
2026.04
36.8
π-Play
Base Model=Qwen3-4B
2026.04
35.2
SQLM*
Base Model=Qwen3-4B-In...
2026.04
35.2
SQLM*
Base Model=Qwen3-8B
2026.04
32
Dr.Zero
Base Model=Qwen3-4B
2026.04
31.2
SQLM*
Base Model=Qwen3-4B
2026.04
29.6
ReAct
Base Model=Qwen3-8B
2026.04
28.8
ReAct
Base Model=Qwen3-4B-In...
2026.04
27.2
ReAct
Base Model=Qwen3-4B
2026.04
8.8
Feedback
Search any
task
Search any
task