Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning over conflicting evidence on SEAL-0
Loading...
45.9
Accuracy
ROMA
2.844
14.022
25.2
36.378
Feb 2, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
ROMA
Source type=Open-sourc...
2026.02
45.9
Kimi-Researcher
Source type=Open-sourc...
2026.02
36
Perplexity Deep Research
Source type=Closed-sou...
2026.02
31.5
Grok-4
Source type=Closed-sou...
2026.02
20.7
Gemini 2.5 Pro
Source type=Closed-sou...
2026.02
19.8
o3-pro
Source type=Closed-sou...
2026.02
18.9
o3
Source type=Closed-sou...
2026.02
15.3
GLM-4.6
Source type=Open-sourc...
2026.02
14.5
Gemini 2.5 Flash
Source type=Closed-sou...
2026.02
13.5
Perplexity Sonar Reasoning Pro
Source type=Closed-sou...
2026.02
13.5
Open Deep Search
Source type=Open-sourc...
2026.02
9.9
Grok-3
Source type=Closed-sou...
2026.02
5.4
Qwen3-235B-A22B
Source type=Open-sourc...
2026.02
5.4
DeepSeek-R1
Source type=Open-sourc...
2026.02
4.5
Feedback
Search any
task
Search any
task