Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Deep Research on HealthBench
Loading...
59.5
Score
GPT-5 + Search
-15.9
3.675
23.25
42.825
Nov 24, 2025
Score
Updated 16d ago
Evaluation Results
Method
Method
Links
Score
GPT-5 + Search
Model Category=Closed...
2025.11
59.5
OpenAI Deep Research
Model Category=Closed...
2025.11
53.8
DR Tulu-8B (RL)
Model Category=Open De...
2025.11
52.8
Tongyi DeepResearch-30B-A3B
Model Category=Open De...
2025.11
46.2
WebThinker-32B-DPO (report)
Model Category=Fixed P...
2025.11
39.4
DR Tulu-8B (SFT)
Model Category=Open De...
2025.11
38.1
Gemini 3 Pro + Search
Model Category=Closed...
2025.11
38
WebThinker QwQ-32B (report)
Model Category=Fixed P...
2025.11
36.5
WebExplorer-8B
Model Category=Open De...
2025.11
33.7
Ai2 ScholarQA - Claude Sonnet
Model Category=Fixed P...
2025.11
32
GPT-5 + Our Search
Model Category=Closed...
2025.11
31.1
QwQ-32B
Model Category=Naive RAG
2025.11
24.5
Qwen3-8B
Model Category=Naive RAG
2025.11
16.5
WebThinker-32B-DPO
Model Category=Open De...
2025.11
11.1
Qwen3-8B + Our Search
Model Category=Open De...
2025.11
5.9
Search-R1-7B
Model Category=Open De...
2025.11
-0.1
ASearcher-Web-7B
Model Category=Open De...
2025.11
-13
Feedback
Search any
task
Search any
task