Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Agent Task on ScienceAgentBench (test)
Loading...
18.6
Success Rate (SR)
SciNav
7.056
10.053
13.05
16.047
Mar 11, 2026
Success Rate (SR)
Verification Rate (VER)
Cost
Updated 26d ago
Evaluation Results
Method
Method
Links
Success Rate (SR)
Verification Rate (VER)
Cost
SciNav
Base Model=GPT-4o (202...
2026.03
18.6
69.9
0.342
SciNav
Base Model=GPT-4o (202...
2026.03
16.1
66
0.512
Self-Debug
Base Model=GPT-4o (202...
2026.03
15
67
0.03
Self-Debug
Base Model=GPT-4o (202...
2026.03
14.7
71.2
0.057
OpenHands
Base Model=GPT-4o (202...
2026.03
13.1
62.8
1.093
Direct Prompting
Base Model=GPT-4o (202...
2026.03
7.5
42.2
0.011
Feedback
Search any
task
Search any
task