Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Reasoning on Scientific Reasoning Subset A
Loading...
14.7
ROUGE-L
BioProAgent
7.836
9.618
11.4
13.182
Mar 1, 2026
ROUGE-L
Ssem
Cs
Time (s)
Updated 1mo ago
Evaluation Results
Method
Method
Links
ROUGE-L
Ssem
Cs
Time (s)
BioProAgent
Backbone=Gemini-3-Flash
2026.03
14.7
34.4
59.1
71.8
Direct
Backbone=Gemini-3-Flash
2026.03
13
24.7
32.2
12.1
Direct
Backbone=DeepSeek-V3
2026.03
12.3
26
28.5
52.1
Reflexion
Backbone=Gemini-3-Flash
2026.03
11.8
28.2
43.9
148.4
ReAct
Backbone=Gemini-3-Flash
2026.03
11.6
26.8
45.5
44.5
AutoGPT
Backbone=Gemini-3-Flash
2026.03
11.6
25.8
42.9
119.6
Direct
Backbone=GPT-4o
2026.03
10.7
20.2
18.9
13.8
Biomni
Backbone=(Specialized)
2026.03
8.1
25.2
34.2
87.1
Feedback
Search any
task
Search any
task