Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Step Execution on Multimodal-Mind2Web
Loading...
66
Step Success Rate
UI-TARS-7B w/ GPT-4.1
1.832
18.491
35.15
51.809
Mar 17, 2026
Step Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Step Success Rate
UI-TARS-7B w/ GPT-4.1
Params=7B w/ -
2026.03
66
UI-TARS-7B w/ TraceR1
Params=7B w/ 8B
2026.03
65.3
UI-TARS-32B
Params=32B
2026.03
64.7
UI-TARS-7B
Params=7B
2026.03
63.1
UI-TARS-2B
Params=2B
2026.03
53.1
Claude-computer-use
Params=-
2026.03
52.5
OmniParser-v2.0 w/ GPT-4o
Params=-
2026.03
41.3
GPT-4o
Params=-
2026.03
4.3
Feedback
Search any
task
Search any
task