Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Search on HotpotQA
Loading...
0.603
String-F1
Vanilla-Qwen3-32B
0.17764
0.28807
0.3985
0.50893
Apr 6, 2026
String-F1
Updated 11d ago
Evaluation Results
Method
Method
Links
String-F1
Vanilla-Qwen3-32B
ASP=false
2026.04
0.603
Vanilla-Qwen3-8B
ASP=false
2026.04
0.582
Mixed-Qwen3-1.7B
ASP=true
2026.04
0.582
SFT-Qwen3-1.7B
ASP=true
2026.04
0.576
OPD-Qwen3-1.7B
ASP=true
2026.04
0.562
Vanilla-Qwen3-4B
ASP=false
2026.04
0.534
SFT-Llama-3.2-3B
ASP=true
2026.04
0.53
Distilled-Qwen3-1.7B
ASP=false
2026.04
0.479
Distilled-Llama3.2-3B
ASP=false
2026.04
0.471
SFT-Qwen3-0.6B
ASP=true
2026.04
0.47
SFT-Llama-3.2-1B
ASP=true
2026.04
0.445
Vanilla-Qwen3-1.7B
ASP=false
2026.04
0.423
Distilled-Llama3.2-1B
ASP=false
2026.04
0.382
Vanilla-Qwen3-0.6B
ASP=false
2026.04
0.194
Feedback
Search any
task
Search any
task