Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web-based tool-use on Mind2Web
Loading...
39.84
Task Performance
LAR
12.8104
19.8277
26.845
33.8623
May 18, 2026
Task Performance
Relative Change in Action Tokens
Updated 15d ago
Evaluation Results
Method
Method
Links
Task Performance
Relative Change in Action Tokens
LAR
Backbone=Qwen3-8B
2026.05
39.84
-2.9
Vanilla
Backbone=Qwen3-8B
2026.05
36.73
-
ConciseHint
Backbone=Qwen3-8B
2026.05
35.33
11.9
COT
Backbone=Qwen3-8B
2026.05
34.15
-
TokenSkip
Backbone=Qwen3-8B
2026.05
31.13
-16.6
ACON
Backbone=Qwen3-8B
2026.05
30.77
-17.7
LAR
Backbone=Llama-3.1 8B-...
2026.05
28.3
-20.8
Vanilla
Backbone=Llama-3.1 8B-...
2026.05
24.4
-
ConciseHint
Backbone=Llama-3.1 8B-...
2026.05
17.33
-13
ACON
Backbone=Llama-3.1 8B-...
2026.05
15.63
-16.7
TokenSkip
Backbone=Llama-3.1 8B-...
2026.05
14.27
-1.9
COT
Backbone=Llama-3.1 8B-...
2026.05
13.85
-
Feedback
Search any
task
Search any
task