Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Navigation on WebWalker 100 tasks (test)
Loading...
0.125
Success Rate (Easy)
Vanilla GRPO
-0.005
0.02875
0.0625
0.09625
May 10, 2026
Success Rate (Easy)
Success Rate (Medium)
Success Rate (Hard)
Success Rate (Overall)
Updated 22d ago
Evaluation Results
Method
Method
Links
Success Rate (Easy)
Success Rate (Medium)
Success Rate (Hard)
Success Rate (Overall)
Vanilla GRPO
Backbone=Qwen3-4b
2026.05
0.125
0.2206
0.25
22
Skill-R1
Mode=GRPO, Backbone=Qw...
2026.05
0.125
0.2941
0.2083
26
GPT-4o-mini
Skills=no skills
2026.05
0
0.0147
0.0417
2
Skill-R1
Mode=Inference, Backbo...
2026.05
0
0.2059
0.2083
19
Feedback
Search any
task
Search any
task