Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Discovery on Average MBO, NHO, SPO, TMC
Loading...
14.6
Avg APD
ReAct
12.996
23.823
34.65
45.477
Feb 6, 2026
Avg APD
Avg AUOC
Updated 1mo ago
Evaluation Results
Method
Method
Links
Avg APD
Avg AUOC
ReAct
Backbone LLM=Qwen3-32B...
2026.02
14.6
54.8
Vanilla
Backbone LLM=Qwen3-32B...
2026.02
14.8
50.5
The AI Scientist v1
Backbone LLM=Gemini-2....
2026.02
14.9
59.6
ReAct
Backbone LLM=Gemini-2....
2026.02
16.6
58.4
The AI Scientist v1
Backbone LLM=Qwen3-32B...
2026.02
20.5
56.2
The AI Scientist v2
Backbone LLM=Gemini-2....
2026.02
22.6
61.1
Vanilla
Backbone LLM=Gemini-2....
2026.02
23.1
57.5
The AI Scientist v2
Backbone LLM=Qwen3-32B...
2026.02
24.1
55.9
AI Researcher
Backbone LLM=Qwen3-32B...
2026.02
24.9
56.6
AI Researcher
Backbone LLM=Gemini-2....
2026.02
26
49
PiFlow
Backbone LLM=Gemini-2....
2026.02
32
58.3
PiFlow
Backbone LLM=Qwen3-32B...
2026.02
44.8
62.9
PIEVO
Backbone LLM=Qwen3-32B...
2026.02
49.7
79.2
PIEVO
Backbone LLM=Gemini-2....
2026.02
54.7
84
Feedback
Search any
task
Search any
task