Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Research Idea Generation on Ten benchmark topics (100 generated research ideas)
Loading...
4.27
Average Wins
EvoSci
2.3148
2.8224
3.33
3.8376
May 20, 2026
Average Wins
Top 10 Count
Updated 8d ago
Evaluation Results
Method
Method
Links
Average Wins
Top 10 Count
EvoSci
Agent Model=GPT-4o
2026.05
4.27
54
EvoSci
Agent Model=Qwen3-max
2026.05
4.25
50
EvoSci
Agent Model=DeepSeek-v3
2026.05
4.19
47
CoI-Agent
Agent Model=DeepSeek-v3
2026.05
4.08
37
VirSci
Agent Model=GPT-4o
2026.05
4.07
52
CoI-Agent
Agent Model=Qwen3-max
2026.05
4
39
VirSci
Agent Model=Qwen3-max
2026.05
3.94
34
VirSci
Agent Model=DeepSeek-v3
2026.05
3.9
35
AI Scientist
Agent Model=GPT-4o
2026.05
3.88
13
CoI-Agent
Agent Model=GPT-4o
2026.05
3.58
36
AI Scientist
Agent Model=Qwen3-max
2026.05
2.92
7
AI Scientist
Agent Model=DeepSeek-v3
2026.05
2.83
8
SciPIP
Agent Model=GPT-4o
2026.05
2.7
7
SciPIP
Agent Model=DeepSeek-v3
2026.05
2.5
3
SciPIP
Agent Model=Qwen3-max
2026.05
2.39
1
Feedback
Search any
task
Search any
task