Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Interactive Agent on SciWorld (Pass@1, Pass@128)
Loading...
29.45
Pass@1
ACE
6.102
12.1635
18.225
24.2865
Mar 17, 2026
Pass@1
Pass@128
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@128
ACE
Backbone=Qwen2.5-7B
2026.03
29.45
59.67
LEAFE
Backbone=Qwen2.5-7B
2026.03
27.88
62
GRPO-RLVR
Backbone=Qwen2.5-7B
2026.03
27.17
57.33
EARLY-EXP
Backbone=Qwen2.5-7B
2026.03
26.17
54.67
ACE
Backbone=Llama3.1-8B
2026.03
25.28
57.33
GRPO-RLVR
Backbone=Llama3.1-8B
2026.03
24.25
56
EARLY-EXP
Backbone=Llama3.1-8B
2026.03
24.04
56
LEAFE
Backbone=Llama3.1-8B
2026.03
22.7
59.33
BASE (NO FT)
Backbone=Llama3.1-8B
2026.03
7.17
48.67
BASE (NO FT)
Backbone=Qwen2.5-7B
2026.03
7
47.33
Feedback
Search any
task
Search any
task