Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Scientific Discovery on Average MBO, NHO, SPO, TMC
Loading...
14.6
Avg APD
ReAct
12.996
23.823
34.65
45.477
Feb 6, 2026
Avg APD
Avg AUOC
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg APD
Avg AUOC
ReAct
Backbone LLM=Qwen3-32B...
2026.02
14.6
54.8
Vanilla
Backbone LLM=Qwen3-32B...
2026.02
14.8
50.5
The AI Scientist v1
Backbone LLM=Gemini-2....
2026.02
14.9
59.6
ReAct
Backbone LLM=Gemini-2....
2026.02
16.6
58.4
The AI Scientist v1
Backbone LLM=Qwen3-32B...
2026.02
20.5
56.2
The AI Scientist v2
Backbone LLM=Gemini-2....
2026.02
22.6
61.1
Vanilla
Backbone LLM=Gemini-2....
2026.02
23.1
57.5
The AI Scientist v2
Backbone LLM=Qwen3-32B...
2026.02
24.1
55.9
AI Researcher
Backbone LLM=Qwen3-32B...
2026.02
24.9
56.6
AI Researcher
Backbone LLM=Gemini-2....
2026.02
26
49
PiFlow
Backbone LLM=Gemini-2....
2026.02
32
58.3
PiFlow
Backbone LLM=Qwen3-32B...
2026.02
44.8
62.9
PIEVO
Backbone LLM=Qwen3-32B...
2026.02
49.7
79.2
PIEVO
Backbone LLM=Gemini-2....
2026.02
54.7
84
Feedback
Search any
task
Search any
task