Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Agent on WebArena Lite
Loading...
21.2
SR
WEASEL (10K steps)
4.872
9.111
13.35
17.589
May 19, 2026
SR
Updated 13d ago
Evaluation Results
Method
Method
Links
SR
WEASEL (10K steps)
Model=Qwen3-8B, # Data...
2026.05
21.2
Pruning + LLM-Judge (10K steps)
Model=Qwen3-8B, # Data...
2026.05
19.4
Full (52K steps)
Model=Qwen3-8B, # Data...
2026.05
17.7
Pruning (52K steps)
Model=Qwen3-8B, # Data...
2026.05
17.6
Pruning + Sampling (10K steps)
Model=Qwen3-8B, # Data...
2026.05
16.5
Qwen3-8B
Model=Qwen3-8B
2026.05
16.4
WEASEL (10K steps)
Model=Qwen2.5-7B-Instr...
2026.05
14.5
WEASEL (10K steps)
Model=Gemma3-4B-IT, #...
2026.05
11.5
Full (52K steps)
Model=Qwen2.5-7B-Instr...
2026.05
10.9
Pruning (52K steps)
Model=Qwen2.5-7B-Instr...
2026.05
9.7
Pruning + Sampling (10K steps)
Model=Qwen2.5-7B-Instr...
2026.05
9.1
Full (52K steps)
Model=Gemma3-4B-IT, #...
2026.05
9.1
Pruning (52K steps)
Model=Gemma3-4B-IT, #...
2026.05
9.1
Pruning + LLM-Judge (10K steps)
Model=Qwen2.5-7B-Instr...
2026.05
8.5
Pruning + Sampling (10K steps)
Model=Gemma3-4B-IT, #...
2026.05
8.5
Gemma3-4B-IT
Model=Gemma3-4B-IT
2026.05
6.7
Pruning + LLM-Judge (10K steps)
Model=Gemma3-4B-IT, #...
2026.05
6.7
Qwen2.5-7B-Instruct
Model=Qwen2.5-7B-Instruct
2026.05
5.5
Feedback
Search any
task
Search any
task