Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Visual Web Navigation on VWA-910
Loading...
58.3
Success Rate (%)
PANDO
37.604
42.977
48.35
53.723
May 24, 2026
Success Rate (%)
95% CI (SR)
Paired Difference vs PANDO (pp)
P-Value
Updated 8d ago
Evaluation Results
Method
Method
Links
Success Rate (%)
95% CI (SR)
Paired Difference vs PANDO (pp)
P-Value
PANDO
Backbone=Claude Opus 4...
2026.05
58.3
56.7
-
-
SGV
Backbone=Gemini-2.5-Fl...
2026.05
54
52.4
4.3
0.001
WALT
Backbone=Claude-4-Sonn...
2026.05
45.2
43.6
13.1
10
GPT-5.2 (M) + SoM (Qwen)
Bootstrap resamples=10...
2026.05
38.4
36.8
19.9
10
Feedback
Search any
task
Search any
task