Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent on WorkArena L2
Loading...
4.7
Success Rate
WEASEL (10K steps)
-0.188
1.081
2.35
3.619
May 19, 2026
Success Rate
Updated 13d ago
Evaluation Results
Method
Method
Links
Success Rate
WEASEL (10K steps)
Model=Qwen2.5-7B-Instr...
2026.05
4.7
WEASEL (10K steps)
Model=Qwen3-8B, # Data...
2026.05
4.3
Pruning + LLM-Judge (10K steps)
Model=Qwen3-8B, # Data...
2026.05
3.8
Pruning + Sampling (10K steps)
Model=Qwen3-8B, # Data...
2026.05
3.4
Pruning + Sampling (10K steps)
Model=Qwen2.5-7B-Instr...
2026.05
3
Pruning + LLM-Judge (10K steps)
Model=Qwen2.5-7B-Instr...
2026.05
3
WEASEL (10K steps)
Model=Gemma3-4B-IT, #...
2026.05
3
Pruning (52K steps)
Model=Qwen3-8B, # Data...
2026.05
2.6
Pruning + Sampling (10K steps)
Model=Gemma3-4B-IT, #...
2026.05
2.1
Pruning + LLM-Judge (10K steps)
Model=Gemma3-4B-IT, #...
2026.05
2.1
Full (52K steps)
Model=Qwen3-8B, # Data...
2026.05
2.1
Qwen3-8B
Model=Qwen3-8B
2026.05
1.7
Full (52K steps)
Model=Qwen2.5-7B-Instr...
2026.05
0.4
Pruning (52K steps)
Model=Qwen2.5-7B-Instr...
2026.05
0.4
Qwen2.5-7B-Instruct
Model=Qwen2.5-7B-Instruct
2026.05
0
Gemma3-4B-IT
Model=Gemma3-4B-IT
2026.05
0
Full (52K steps)
Model=Gemma3-4B-IT, #...
2026.05
0
Pruning (52K steps)
Model=Gemma3-4B-IT, #...
2026.05
0
Feedback
Search any
task
Search any
task