Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Agent Task Completion on ScienceAgentBench
Loading...
43.1
Success Rate (SR)
Mimosa
2.228
12.839
23.45
34.061
Mar 30, 2026
Success Rate (SR)
CodeBERT Score (CBS)
Cost per Task ($)
Updated 18d ago
Evaluation Results
Method
Method
Links
Success Rate (SR)
CodeBERT Score (CBS)
Cost per Task ($)
Mimosa
Mode=Iterative Learnin...
2026.03
43.1
92.1
1.7
Mimosa
Mode=Single Agent (Smo...
2026.03
38.2
89.8
0.05
Mimosa
Mode=Multi-Agent One-s...
2026.03
32.4
79.4
0.38
Mimosa
Mode=Multi-Agent One-s...
2026.03
31.3
77.3
2.2
Mimosa
Mode=Iterative Learnin...
2026.03
30.3
88.5
7.8
Mimosa
Mode=Iterative Learnin...
2026.03
21.6
72.1
3.5
Mimosa
Mode=Multi-Agent One-s...
2026.03
18.6
64
1
Mimosa
Mode=Single Agent (Smo...
2026.03
13.5
68.1
1.3
Mimosa
Mode=Single Agent (Smo...
2026.03
7.8
57
1.46
Mimosa
Mode=Single Agent (Smo...
2026.03
3.8
60.7
0.56
Feedback
Search any
task
Search any
task