Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent QA on WebWalkerQA
Loading...
11.4
F1 (Easy)
ReAct + GRPO
5.992
7.396
8.8
10.204
Sep 25, 2025
F1 (Easy)
F1 (Medium)
F1 (Hard)
F1 (Average)
Updated 1mo ago
Evaluation Results
Method
Method
Links
F1 (Easy)
F1 (Medium)
F1 (Hard)
F1 (Average)
ReAct + GRPO
Model Scale=14b, Model...
2025.09
11.4
14.8
10.3
12.4
ReAct + Tree-GRPO
Model Scale=14b, Model...
2025.09
11.1
15.5
10.8
12.8
ReAct
Model Scale=14b, Model...
2025.09
9.5
11.3
7.4
9.5
DeepSeek-R1-Distill-32b
Model Scale=32b, Model...
2025.09
9.4
13.3
9.4
11
ReAct + Tree-GRPO
Model Scale=7b, Model...
2025.09
9.3
11.8
11.9
11.2
ReAct + GRPO
Model Scale=7b, Model...
2025.09
8.9
11.4
11.6
10.9
ReAct
Model Scale=7b, Model...
2025.09
8
9.2
5.6
7.6
Qwen2.5-32b-Instruct
Model Scale=32b, Model...
2025.09
6.2
9.4
5.8
7.4
Feedback
Search any
task
Search any
task