Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop QA on Musique (EM)
Loading...
18.5
Exact Match (EM)
ReSeek
-0.532
4.409
9.35
14.291
Oct 1, 2025
Exact Match (EM)
Updated 22d ago
Evaluation Results
Method
Method
Links
Exact Match (EM)
ReSeek
Backbone=Qwen2.5-7b-In...
2025.10
18.5
ZeroSearch
Backbone=Qwen2.5-7b-In...
2025.10
18.4
Search-R1
Backbone=Qwen2.5-7b-In...
2025.10
14.6
Search-R1
Backbone=Qwen2.5-3b-In...
2025.10
10.3
ReSeek
Backbone=Qwen2.5-3b-In...
2025.10
10.3
ZeroSearch
Backbone=Qwen2.5-3b-In...
2025.10
9.8
R1
Backbone=Qwen2.5-7b-In...
2025.10
7.2
R1
Backbone=Qwen2.5-3b-In...
2025.10
6
RAG
Backbone=Qwen2.5-7b-In...
2025.10
5.8
Search-o1
Backbone=Qwen2.5-7b-In...
2025.10
5.8
Search-o1
Backbone=Qwen2.5-3b-In...
2025.10
5.4
RAG
Backbone=Qwen2.5-3b-In...
2025.10
4.7
Direct Inference
Backbone=Qwen2.5-7b-In...
2025.10
3.1
CoT
Backbone=Qwen2.5-7b-In...
2025.10
2.2
Direct Inference
Backbone=Qwen2.5-3b-In...
2025.10
2
CoT
Backbone=Qwen2.5-3b-In...
2025.10
0.2
Feedback
Search any
task
Search any
task