Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Search on Bamboogle (String-F1)
Loading...
73.1
String-F1 Score
Vanilla-Qwen3-32B
32.852
43.301
53.75
64.199
Apr 6, 2026
String-F1 Score
Updated 11d ago
Evaluation Results
Method
Method
Links
String-F1 Score
Vanilla-Qwen3-32B
ASP=false
2026.04
73.1
Vanilla-Qwen3-8B
ASP=false
2026.04
71.5
SFT-Qwen3-1.7B
ASP=true
2026.04
70.6
Mixed-Qwen3-1.7B
ASP=true
2026.04
69.4
Vanilla-Qwen3-4B
ASP=false
2026.04
69.1
SFT-Llama-3.2-3B
ASP=true
2026.04
68.4
SFT-Llama-3.2-1B
ASP=true
2026.04
64.6
SFT-Qwen3-0.6B
ASP=true
2026.04
62.9
OPD-Qwen3-1.7B
ASP=true
2026.04
61.4
Distilled-Llama3.2-3B
ASP=false
2026.04
53.6
Distilled-Qwen3-1.7B
ASP=false
2026.04
53.2
Vanilla-Qwen3-1.7B
ASP=false
2026.04
50.6
Distilled-Llama3.2-1B
ASP=false
2026.04
37.9
Vanilla-Qwen3-0.6B
ASP=false
2026.04
34.4
Feedback
Search any
task
Search any
task