Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent on WorkArena L1
Loading...
38.8
Success Rate
WEASEL (10K steps)
0.944
10.772
20.6
30.428
May 19, 2026
Success Rate
Updated 13d ago
Evaluation Results
Method
Method
Links
Success Rate
WEASEL (10K steps)
Model=Qwen3-8B, # Data...
2026.05
38.8
Qwen3-8B
Model=Qwen3-8B
2026.05
35.2
Pruning + LLM-Judge (10K steps)
Model=Qwen3-8B, # Data...
2026.05
35.2
Pruning + Sampling (10K steps)
Model=Qwen3-8B, # Data...
2026.05
33.9
Full (52K steps)
Model=Qwen3-8B, # Data...
2026.05
33.3
Pruning (52K steps)
Model=Qwen3-8B, # Data...
2026.05
15.5
Pruning (52K steps)
Model=Qwen2.5-7B-Instr...
2026.05
12.4
WEASEL (10K steps)
Model=Qwen2.5-7B-Instr...
2026.05
12.4
Full (52K steps)
Model=Qwen2.5-7B-Instr...
2026.05
12.1
Pruning + Sampling (10K steps)
Model=Qwen2.5-7B-Instr...
2026.05
9.8
Pruning + LLM-Judge (10K steps)
Model=Qwen2.5-7B-Instr...
2026.05
8.5
Qwen2.5-7B-Instruct
Model=Qwen2.5-7B-Instruct
2026.05
4.8
Pruning + LLM-Judge (10K steps)
Model=Gemma3-4B-IT, #...
2026.05
4.5
WEASEL (10K steps)
Model=Gemma3-4B-IT, #...
2026.05
4.5
Gemma3-4B-IT
Model=Gemma3-4B-IT
2026.05
3.6
Full (52K steps)
Model=Gemma3-4B-IT, #...
2026.05
3.3
Pruning (52K steps)
Model=Gemma3-4B-IT, #...
2026.05
2.7
Pruning + Sampling (10K steps)
Model=Gemma3-4B-IT, #...
2026.05
2.4
Feedback
Search any
task
Search any
task