Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Code Generation on ScienceAgentBench (test)
Loading...
27.5
SR
OpenHands CodeAct
10.132
14.641
19.15
23.659
Mar 3, 2026
SR
CBS
VER
Updated 1mo ago
Evaluation Results
Method
Method
Links
SR
CBS
VER
OpenHands CodeAct
Base Model=GPT-4o, Kno...
2026.03
27.5
86.3
73.5
LCP
Base Model=GPT-4o, Kno...
2026.03
27.5
86.4
87.3
LCP
Base Model=GPT-4o, Kno...
2026.03
26.5
85.1
90.2
Self-Debug
Base Model=GPT-4o, Kno...
2026.03
23.5
85.6
71.6
Self-Debug
Base Model=GPT-4o, Kno...
2026.03
22.6
84.4
83.3
OpenHands CodeAct
Base Model=GPT-4o, Kno...
2026.03
19.6
83.1
78.4
Direct
Base Model=GPT-4o, Kno...
2026.03
11.8
82.6
52.9
Direct
Base Model=GPT-4o, Kno...
2026.03
10.8
83.8
41.2
Feedback
Search any
task
Search any
task