Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-hop QA on Bamboogle (EM, F1)
Loading...
56
EM
Search-o1
37.696
42.448
47.2
51.952
Jan 9, 2025
EM
F1
Updated 4d ago
Evaluation Results
Method
Method
Links
EM
F1
Search-o1
Reasoning Protocol=Ret...
2025.01
56
67.8
RAG-QwQ-32B
Reasoning Protocol=Ret...
2025.01
55.2
67.4
Llama3.3-70B
Reasoning Protocol=Dir...
2025.01
54.4
67.8
RAgent-Qwen2.5-32B
Reasoning Protocol=Ret...
2025.01
54.4
66.4
RAG-Qwen2.5-32B
Reasoning Protocol=Ret...
2025.01
52
66
RAgent-QwQ-32B
Reasoning Protocol=Ret...
2025.01
52
64.7
Qwen2.5-32B
Reasoning Protocol=Dir...
2025.01
49.6
63.2
Qwen2.5-72B
Reasoning Protocol=Dir...
2025.01
47.2
61.7
QwQ-32B
Reasoning Protocol=Dir...
2025.01
38.4
53.7
Feedback
Search any
task
Search any
task