Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Research Idea Evaluation on 5-seed blinded subset (test)
Loading...
6.24
Novelty
Si baseline
5.6992
5.8396
5.98
6.1204
May 14, 2026
Novelty
Significance
Feasibility
Clarity
Effectiveness
Excellence
Soundness
Originality
Reproducibility
Overall Score
Updated 19d ago
Evaluation Results
Method
Method
Links
Novelty
Significance
Feasibility
Clarity
Effectiveness
Excellence
Soundness
Originality
Reproducibility
Overall Score
Si baseline
2026.05
6.24
7.04
6
7.08
6.72
6.68
6.24
6.24
5.96
6.48
ResearchAgent
2026.05
5.92
6.76
4.24
4.68
5.8
5.6
4.24
5.92
4.04
4.84
GoR-SFT
2026.05
5.88
6.72
6.96
7.48
6.2
6.24
7
5.92
6.92
6.56
CoI-Agent
2026.05
5.72
6.84
5.64
6.24
6.4
6
5.72
5.68
6
6
Feedback
Search any
task
Search any
task