Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop Question Answering on MuSiQue (Score)
Loading...
37.8
Score
ToolForge
0.152
9.926
19.7
29.474
Apr 15, 2026
Score
Updated 2d ago
Evaluation Results
Method
Method
Links
Score
ToolForge
Base Model=Qwen3-8B
2026.04
37.8
Dr.Zero
Base Model=Qwen3-4B-In...
2026.04
14.4
π-Play
Base Model=Qwen3-4B-In...
2026.04
13.4
Dr.Zero
Base Model=Qwen3-8B
2026.04
13.1
Search-R1
Base Model=Qwen3-4B-In...
2026.04
12.9
π-Play
Base Model=Qwen3-8B
2026.04
12.4
π-Play
Base Model=Qwen3-4B
2026.04
11.2
SQLM*
Base Model=Qwen3-4B
2026.04
10.8
Search-R1
Base Model=Qwen3-4B
2026.04
10.5
SQLM*
Base Model=Qwen3-4B-In...
2026.04
10.2
SQLM*
Base Model=Qwen3-8B
2026.04
9.9
Search-R1
Base Model=Qwen3-8B
2026.04
9.8
Dr.Zero
Base Model=Qwen3-4B
2026.04
7.8
ReAct
Base Model=Qwen3-8B
2026.04
7.4
ReAct
Base Model=Qwen3-4B-In...
2026.04
6.3
ReAct
Base Model=Qwen3-4B
2026.04
1.6
Feedback
Search any
task
Search any
task