Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web-agent QA on BrowseComp
Loading...
2.7
F1 (Avg)
ReAct + Tree-GRPO
1.14
1.545
1.95
2.355
Sep 25, 2025
F1 (Avg)
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 (Avg)
ReAct + Tree-GRPO
Model Scale=7b, Model...
2025.09
2.7
ReAct + Tree-GRPO
Model Scale=14b, Model...
2025.09
2.6
DeepSeek-R1-Distill-32b
Model Scale=32b, Model...
2025.09
2.4
ReAct + GRPO
Model Scale=14b, Model...
2025.09
2.4
ReAct + GRPO
Model Scale=7b, Model...
2025.09
2.3
Qwen2.5-32b-Instruct
Model Scale=32b, Model...
2025.09
2.2
ReAct
Model Scale=7b, Model...
2025.09
1.3
ReAct
Model Scale=14b, Model...
2025.09
1.2
Feedback
Search any
task
Search any
task