Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Hardware Execution on Subset B Hardware Execution
Loading...
0.653
Scode
BioProAgent
0.0134
0.17945
0.3455
0.51155
Mar 1, 2026
Scode
Cp
Accparam
Updated 1mo ago
Evaluation Results
Method
Method
Links
Scode
Cp
Accparam
BioProAgent
Backbone=Gemini-3-Flash
2026.03
0.653
0.956
0.61
Direct
Backbone=GPT-4o
2026.03
0.59
0.995
0.295
Direct
Backbone=Gemini-3-Flash
2026.03
0.576
0.996
0.287
AutoGPT
Backbone=Gemini-3-Flash
2026.03
0.54
0.911
0.468
Direct
Backbone=DeepSeek-V3
2026.03
0.495
0.995
0.205
Reflexion
Backbone=Gemini-3-Flash
2026.03
0.278
0.534
0.403
ReAct
Backbone=Gemini-3-Flash
2026.03
0.038
0.21
0.103
Feedback
Search any
task
Search any
task