Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web-Agent QA on SimpleQA
Loading...
67.8
F1 (Avg)
ReAct + Tree-GRPO
5.296
21.523
37.75
53.977
Sep 25, 2025
F1 (Avg)
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 (Avg)
ReAct + Tree-GRPO
Model Scale=14b, Model...
2025.09
67.8
ReAct + GRPO
Model Scale=14b, Model...
2025.09
65.4
ReAct + Tree-GRPO
Model Scale=7b, Model...
2025.09
62.4
ReAct + GRPO
Model Scale=7b, Model...
2025.09
61.5
ReAct
Model Scale=14b, Model...
2025.09
43.3
ReAct
Model Scale=7b, Model...
2025.09
25.1
DeepSeek-R1-Distill-32b
Model Scale=32b, Model...
2025.09
12.6
Qwen2.5-32b-Instruct
Model Scale=32b, Model...
2025.09
7.7
Feedback
Search any
task
Search any
task