Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Performance on VitaBench OTA
Loading...
9.75
Avg@4
NoisyAgent
1.43
3.59
5.75
7.91
May 26, 2026
Avg@4
Pass@4
Updated 7d ago
Evaluation Results
Method
Method
Links
Avg@4
Pass@4
NoisyAgent
backbone=Qwen3-32B
2026.05
9.75
15
DAPO
backbone=Qwen3-32B
2026.05
9.25
15
GSPO
backbone=Qwen3-32B
2026.05
9
14
GRPO
backbone=Qwen3-32B
2026.05
8.75
14
Qwen3-32B
backbone=Qwen3-32B
2026.05
7
12
NoisyAgent
backbone=Qwen3-8B
2026.05
5
9
GSPO
backbone=Qwen3-8B
2026.05
4.5
8
GRPO
backbone=Qwen3-8B
2026.05
4.25
7
DAPO
backbone=Qwen3-8B
2026.05
4
7
Qwen3-8B
backbone=Qwen3-8B
2026.05
1.75
4
Feedback
Search any
task
Search any
task