Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent Navigation on WebArena
Loading...
27.8
Success Rate
AdaRubric-DA
11.68
15.865
20.05
24.235
Mar 22, 2026
Success Rate
Performance Delta
Updated 25d ago
Evaluation Results
Method
Method
Links
Success Rate
Performance Delta
AdaRubric-DA
Backbone=Qwen2.5-7B, F...
2026.03
27.8
15.5
AdaRubric-GM
Backbone=Qwen2.5-7B, F...
2026.03
25.1
12.8
AdaRubric-WM
Backbone=Qwen2.5-7B, F...
2026.03
24.3
12
Prometheus
Backbone=Qwen2.5-7B, F...
2026.03
21
8.7
G-Eval
Backbone=Qwen2.5-7B, F...
2026.03
20.1
7.8
Random pairs
Backbone=Qwen2.5-7B, F...
2026.03
17.4
5.1
SFT-Success only
Backbone=Qwen2.5-7B, F...
2026.03
16.7
4.4
Base
Backbone=Qwen2.5-7B, F...
2026.03
12.3
-
Feedback
Search any
task
Search any
task