Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Research Supervision on Scientist-Bench
Loading...
2
Number of Stages
MLR-Copilot
1.88
2.69
3.5
4.31
Mar 25, 2026
Number of Stages
Cost per Run
Updated 2mo ago
Evaluation Results
Method
Method
Links
Number of Stages
Cost per Run
MLR-Copilot
GPU=Yes, LLM=GPT-4
2026.03
2
-
AI Scientist v1
GPU=Yes, LLM=Claude So...
2026.03
3
15
Agent Lab
LLM=GPT-4o, GPU=Config.
2026.03
3
2.33
Agent Lab
LLM=o1-preview, GPU=Co...
2026.03
3
13.1
AI-Researcher
GPU=Yes, LLM=Gemini 2....
2026.03
3
-
AI-Supervisor (efficient)
GPU=No, LLM=Qwen-72B
2026.03
5
8
AI-Supervisor (frontier)
GPU=No, LLM=GPT-4o / C...
2026.03
5
50
AI-Supervisor (local)
GPU=Consumer, LLM=LLaM...
2026.03
5
0
Feedback
Search any
task
Search any
task