Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Deep Research on SQA v2
Loading...
88.3
Score
DR Tulu-8B (RL)
19.556
37.403
55.25
73.097
Nov 24, 2025
Score
Updated 16d ago
Evaluation Results
Method
Method
Links
Score
DR Tulu-8B (RL)
Model Category=Open De...
2025.11
88.3
Ai2 ScholarQA - Claude Sonnet
Model Category=Fixed P...
2025.11
87.7
OpenAI Deep Research
Model Category=Closed...
2025.11
79.6
GPT-5 + Search
Model Category=Closed...
2025.11
74.8
DR Tulu-8B (SFT)
Model Category=Open De...
2025.11
72.3
Gemini 3 Pro + Search
Model Category=Closed...
2025.11
69.8
Perplexity Deep Research
Model Category=Closed...
2025.11
67.3
GPT-5 + Our Search
Model Category=Closed...
2025.11
61.1
Qwen3-8B + Our Search
Model Category=Open De...
2025.11
57.2
WebThinker-32B-DPO (report)
Model Category=Fixed P...
2025.11
46.7
Tongyi DeepResearch-30B-A3B
Model Category=Open De...
2025.11
46.5
WebThinker QwQ-32B (report)
Model Category=Fixed P...
2025.11
45.2
WebExplorer-8B
Model Category=Open De...
2025.11
42.5
QwQ-32B
Model Category=Naive RAG
2025.11
41.9
Qwen3-8B
Model Category=Naive RAG
2025.11
40.4
WebThinker-32B-DPO
Model Category=Open De...
2025.11
32.9
ASearcher-Web-7B
Model Category=Open De...
2025.11
26.9
Search-R1-7B
Model Category=Open De...
2025.11
22.2
Feedback
Search any
task
Search any
task