Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Deep Research on FRAMES
Loading...
56
Accuracy
TOOLSELF
19.6
29.05
38.5
47.95
Feb 8, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
TOOLSELF
Base Model=Qwen3-14B
2026.02
56
OAgents
Base Model=Qwen3-14B
2026.02
48
ReSum
Base Model=Qwen3-14B
2026.02
46.5
Co-Sight
Base Model=Qwen3-14B
2026.02
46
TOOLSELF
Base Model=Qwen3-8B
2026.02
44
WebSailor
Base Model=Qwen3-14B
2026.02
42
ReSum
Base Model=Qwen3-8B
2026.02
39.5
Vanilla Agent
Base Model=Qwen3-14B
2026.02
38
OWL
Base Model=Qwen3-14B
2026.02
37
OAgents
Base Model=Qwen3-8B
2026.02
33
Co-Sight
Base Model=Qwen3-8B
2026.02
32
WebSailor
Base Model=Qwen3-8B
2026.02
28
OWL
Base Model=Qwen3-8B
2026.02
23
Vanilla Agent
Base Model=Qwen3-8B
2026.02
21
Feedback
Search any
task
Search any
task