Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Scientific Ideation on 30 open research questions from PhD candidates (test)
Loading...
3.43
Average Novelty
Our Method
2.0676
2.4213
2.775
3.1287
Apr 17, 2026
Average Novelty
Average Feasibility
Average Effectiveness
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Novelty
Average Feasibility
Average Effectiveness
Our Method
Inference strategy=BoN...
2026.04
3.43
3.13
3.38
SFT
Training protocol=SFT
2026.04
3.11
2.82
2.94
Base Model
Model architecture=14B...
2026.04
2.83
3.29
2.93
LDC
Training protocol=LDC
2026.04
2.62
2.86
2.67
GPT Researcher
2026.04
2.47
3.17
2.69
Research Agent
2026.04
2.46
2.55
2.41
AI Scientist V2
2026.04
2.12
3.02
2.7
Feedback
Search any
task
Search any
task