Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Research Idea Evaluation on ScholarIdeas (Evaluation)
Loading...
23.33
Reference Inv.
GPT-5.1 Instant
-0.9332
5.3659
11.665
17.9641
Oct 17, 2025
Reference Inv.
Updated 3mo ago
Evaluation Results
Method
Method
Links
Reference Inv.
GPT-5.1 Instant
2025.10
23.33
Llama-3.3-70B
2025.10
19.07
GPT-4.1
2025.10
15.22
Claude-4-Sonnet
2025.10
13.9
GPT-4o-search-preview
search-enabled=true
2025.10
1.66
OpenAI Deep Research
2025.10
1.07
DR Tulu
2025.10
0
ScholarEval Llama
Backbone=Llama-3.3-70B
2025.10
0
ScholarEval GPT-4.1
Backbone=GPT-4.1
2025.10
0
ScholarEval GPT-5.1
Backbone=GPT-5.1 Instant
2025.10
0
ScholarEval Claude
Backbone=Claude-4-Sonnet
2025.10
0
Feedback
Search any
task
Search any
task