Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-form research on ResearchQA
Loading...
79.2
Score
OpenAI Deep Research
25.848
39.699
53.55
67.401
May 11, 2026
Score
Updated 22d ago
Evaluation Results
Method
Method
Links
Score
OpenAI Deep Research
Category=Closed Deep R...
2026.05
79.2
GPT-5 + Search
Category=Closed Deep R...
2026.05
78.2
Perplexity Deep Research
Category=Closed Deep R...
2026.05
75.3
Ai2 ScholarQA – Claude Sonnet
Category=Fixed Pipelin...
2026.05
75
Gemini 3.1 Pro + Search
Category=Closed Deep R...
2026.05
74.5
RubricEM-8B (RL, 1400 steps)
Backbone=8B, Training=...
2026.05
74.5
DR Tulu-8B (RL, 1900 steps)
Category=Open Deep Res...
2026.05
74.3
WebThinker-32B-DPO
Category=Fixed Pipelin...
2026.05
74.2
WebThinker QwQ-32B
Category=Fixed Pipelin...
2026.05
72.8
RubricEM-8B (SFT)
Backbone=8B, Training=SFT
2026.05
71.8
Perplexity-Sonar (High)
Category=Closed Deep R...
2026.05
69.1
Gemini Deep Research
Category=Closed Deep R...
2026.05
68.5
DR Tulu-8B (SFT)
Category=Open Deep Res...
2026.05
68.5
Tongyi DeepResearch-30B-A3B
Category=Open Deep Res...
2026.05
66.7
WebExplorer-8B
Category=Open Deep Res...
2026.05
64.8
Claude-Sonnet Search
Category=Closed Deep R...
2026.05
64.3
Qwen3-8B + Our Search
Backbone=Qwen3-8B, Sea...
2026.05
58.4
Search-R1-7B
Category=Open Deep Res...
2026.05
27.9
Feedback
Search any
task
Search any
task