Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Knowledge Retrieval on HotpotQA
Loading...
89.2
Accuracy
GPT-4o
78.072
80.961
83.85
86.739
Mar 26, 2026
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
GPT-4o
Evaluation Protocol=Cl...
2026.03
89.2
Claude 3.5 Sonnet
Evaluation Protocol=Cl...
2026.03
88.2
Gemini 2.5 Pro
Evaluation Protocol=Cl...
2026.03
87.9
EcoThink
Evaluation Protocol=Ad...
2026.03
87.6
FrugalGPT (Cascade)
Evaluation Protocol=St...
2026.03
84.5
Qwen-3-8B-Instruct
Evaluation Protocol=St...
2026.03
81.2
Llama-3.1-8B-Instruct
Evaluation Protocol=St...
2026.03
78.5
Feedback
Search any
task
Search any
task