Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Research Soundness Evaluation on 34 report pairs (sample)
Loading...
70.6
Evidence Support
ScholarEval
12.464
27.557
42.65
57.743
Oct 17, 2025
Evidence Support
Depth
Actionability
Updated 3mo ago
Evaluation Results
Method
Method
Links
Evidence Support
Depth
Actionability
ScholarEval
2025.10
70.6
79.4
82.4
OpenAI Deep Research
2025.10
14.7
11.8
11.8
Tie
2025.10
14.7
8.8
5.9
Feedback
Search any
task
Search any
task