Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Web Navigation on WebShop (test)
Loading...
0.8935
Score
SELAUR
-0.005788
0.227681
0.46115
0.694619
Feb 24, 2026
Score
Succ.
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
Succ.
SELAUR
Base Model=Qwen2.5-7B
2026.02
0.8935
79.68
GiGPO
Base Model=Qwen2.5-7B
2026.02
0.8867
79.29
SELAUR
Base Model=Qwen2.5-1.5B
2026.02
0.8812
76.56
RLOO
Base Model=Qwen2.5-7B
2026.02
0.8768
75.78
PPO
Base Model=Qwen2.5-7B
2026.02
0.8689
71.87
GRPO
Base Model=Qwen2.5-1.5B
2026.02
0.8359
67.18
GRPO
Base Model=Qwen2.5-7B
2026.02
0.8184
75.78
RLOO
Base Model=Qwen2.5-1.5B
2026.02
0.8022
63.94
GiGPO
Base Model=Qwen2.5-1.5B
2026.02
0.8017
67.57
PPO
Base Model=Qwen2.5-1.5B
2026.02
0.7126
49.21
prompting
Base Model=Qwen2.5-1.5B
2026.02
0.218
4.29
prompting
Base Model=Qwen2.5-7B
2026.02
0.0288
0.39
Feedback
Search any
task
Search any
task