Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Navigation Task Completion on Mind2Web Cross-task
Loading...
64.6
Success Rate
PolySkill (+Update)
51.808
55.129
58.45
61.771
Oct 17, 2025
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
PolySkill (+Update)
Backbone=Claude-3.7-So...
2025.10
64.6
PolySkill (+Update)
Backbone=GPT-4.1, Trai...
2025.10
63.2
PolySkill (Same sub-dom.)
Backbone=Claude-3.7-So...
2025.10
62.3
ASI (+Update)
Backbone=Claude-3.7-So...
2025.10
62.1
PolySkill (Same domain)
Backbone=Claude-3.7-So...
2025.10
62
PolySkill (All Skills)
Backbone=Claude-3.7-So...
2025.10
61.3
ASI (Same sub-domain)
Backbone=Claude-3.7-So...
2025.10
61.2
ASI (Same domain)
Backbone=Claude-3.7-So...
2025.10
60.9
ASI (All skills)
Backbone=Claude-3.7-So...
2025.10
60.3
ASI (+Update)
Backbone=GPT-4.1, Trai...
2025.10
59.4
Baseline
Backbone=Claude-3.7-So...
2025.10
59.1
PolySkill (Same sub-dom.)
Backbone=GPT-4.1, Trai...
2025.10
58.6
PolySkill (Same domain)
Backbone=GPT-4.1, Trai...
2025.10
58.3
ASI (Same sub-domain)
Backbone=GPT-4.1, Trai...
2025.10
56.3
PolySkill (All Skills)
Backbone=GPT-4.1, Trai...
2025.10
55.4
ASI (Same domain)
Backbone=GPT-4.1, Trai...
2025.10
55.2
Baseline
Backbone=GPT-4.1, Trai...
2025.10
53.8
ASI (All skills)
Backbone=GPT-4.1, Trai...
2025.10
52.3
Feedback
Search any
task
Search any
task