Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-hop Factual Reasoning on FRAMES
Loading...
82.3
Accuracy
ROMA
28.012
42.106
56.2
70.294
Feb 2, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
ROMA
Search=true, Backbone=...
2026.02
82.3
Kimi-Researcher
Search=true, Source_Ty...
2026.02
78.8
Open Deep Search (DeepSeek-R1)
Search=true, Source_Ty...
2026.02
75.3
GLM-4.6
Search=true, Source_Ty...
2026.02
71.2
GPT-4o Search Preview
Search=true, Source_Ty...
2026.02
65.6
GPT-4o
Search=false, Source_T...
2026.02
50.5
Perplexity Sonar Reasoning Pro
Search=true, Source_Ty...
2026.02
44.4
Perplexity
Search=true, Source_Ty...
2026.02
42.4
Llama-3.1-70B
Search=false, Source_T...
2026.02
34.3
DeepSeek-R1
Search=false, Source_T...
2026.02
30.1
Feedback
Search any
task
Search any
task