Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Hop QA on FictionalHot
Loading...
6.1
Exact Match (EM)
ReSeek
-0.14
1.48
3.1
4.72
Oct 1, 2025
Exact Match (EM)
Updated 22d ago
Evaluation Results
Method
Method
Links
Exact Match (EM)
ReSeek
Backbone=Qwen2.5-7b-In...
2025.10
6.1
ReSeek
Backbone=Qwen2.5-3b-In...
2025.10
5.9
Search-R1
Backbone=Qwen2.5-3b-In...
2025.10
3.7
Search-R1
Backbone=Qwen2.5-7b-In...
2025.10
3.4
ZeroSearch
Backbone=Qwen2.5-7b-In...
2025.10
3.1
ZeroSearch
Backbone=Qwen2.5-3b-In...
2025.10
3
Search-o1
Backbone=Qwen2.5-7b-In...
2025.10
2
RAG
Backbone=Qwen2.5-7b-In...
2025.10
1.2
Search-o1
Backbone=Qwen2.5-3b-In...
2025.10
1
RAG
Backbone=Qwen2.5-3b-In...
2025.10
0.8
R1
Backbone=Qwen2.5-7b-In...
2025.10
0.3
R1
Backbone=Qwen2.5-3b-In...
2025.10
0.3
Direct Inference
Backbone=Qwen2.5-7b-In...
2025.10
0.1
CoT
Backbone=Qwen2.5-7b-In...
2025.10
0.1
Direct Inference
Backbone=Qwen2.5-3b-In...
2025.10
0.1
CoT
Backbone=Qwen2.5-3b-In...
2025.10
0.1
Feedback
Search any
task
Search any
task