Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-Horizon Stability on BioProBench Subset C (test)
Loading...
100
Success Rate
BioProAgent
30.632
48.641
66.65
84.659
Mar 1, 2026
Success Rate
Parameter Accuracy
Cp Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
Parameter Accuracy
Cp Score
BioProAgent
Backbone=Gemini-3-Flash
2026.03
100
71.8
95
ReAct
Backbone=Gemini-3-Flash
2026.03
88.9
11.4
21.7
AutoGPT
Backbone=Gemini-3-Flash
2026.03
66.7
40.9
64.4
Reflexion
Backbone=Gemini-3-Flash
2026.03
33.3
0
0
Feedback
Search any
task
Search any
task