Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
GUI Navigation on Mind2Web Online (Average)
Loading...
64
Success Rate
Ovis2.5S-GRPO
22.7432
33.4541
44.165
54.8759
Feb 14, 2026
Feb 25, 2026
Mar 9, 2026
Mar 21, 2026
Apr 2, 2026
Apr 14, 2026
Apr 26, 2026
Success Rate
Improvement over SFT
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
Improvement over SFT
Ovis2.5S-GRPO
Training Method=GRPO (...
2026.02
64
11.33
Ovis2.5S-RF++
Training Method=REINFO...
2026.02
64
11.33
Ovis2.5S-RLOO
Training Method=RLOO (...
2026.02
62.67
10
Claude 4 Sonnet CU
Model Type=Proprietary
2026.02
62.33
-
Claude 3.7 Sonnet CU
Model Type=Proprietary
2026.02
61
-
Ovis2.5SFT
Training Method=SFT
2026.02
52.67
-
GPT-4o
Model Type=Proprietary
2026.02
37
-
PageGuide
Backbone=google/gemini...
2026.04
35.17
-
UI-TARS
Model Scale=7B, Versio...
2026.02
33.33
-
PageGuide
Backbone=google/gemini...
2026.04
30.42
-
SeeAct
Backbone=GPT-4
2026.04
30
-
Qwen3-VL
Model Scale=32B
2026.02
27.67
-
Qwen3-VL
Model Scale=8B
2026.02
24.33
-
Feedback
Search any
task
Search any
task