Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Search on 2WikiMultiHopQA
Loading...
69.9
String-F1
Vanilla-Qwen3-32B
24.244
36.097
47.95
59.803
Apr 6, 2026
String-F1
Updated 11d ago
Evaluation Results
Method
Method
Links
String-F1
Vanilla-Qwen3-32B
ASP=false
2026.04
69.9
OPD-Qwen3-1.7B
ASP=true
2026.04
62.9
SFT-Llama-3.2-3B
ASP=true
2026.04
58.8
SFT-Qwen3-1.7B
ASP=true
2026.04
58.5
Vanilla-Qwen3-8B
ASP=false
2026.04
58.1
Mixed-Qwen3-1.7B
ASP=true
2026.04
57.8
Vanilla-Qwen3-4B
ASP=false
2026.04
54.7
SFT-Llama-3.2-1B
ASP=true
2026.04
48.2
SFT-Qwen3-0.6B
ASP=true
2026.04
47.4
Distilled-Llama3.2-3B
ASP=false
2026.04
46.3
Distilled-Qwen3-1.7B
ASP=false
2026.04
44.1
Vanilla-Qwen3-1.7B
ASP=false
2026.04
39.8
Distilled-Llama3.2-1B
ASP=false
2026.04
30.9
Vanilla-Qwen3-0.6B
ASP=false
2026.04
26
Feedback
Search any
task
Search any
task