Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Single-step Action Prediction on WebLinx (test-iid)
Loading...
28.5
Cumulative Runtime
Coverage
24.192
53.271
82.35
111.429
May 28, 2026
Cumulative Runtime
Average Runtime (s)
Updated 5d ago
Evaluation Results
Method
Method
Links
Cumulative Runtime
Average Runtime (s)
Coverage
Evaluation approach=Proxy
2026.05
28.5
0.6
Qwen3.5-122B
Evaluation approach=En...
2026.05
117
42.6
MiniMax-M2.5
Evaluation approach=En...
2026.05
136.2
49.5
Feedback
Search any
task
Search any
task