Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on NQ, TriviaQA, PopQA, HotpotQA, 2wiki, Musique, Bamboogle, FictionalHot
Loading...
37.7
Exact Match (EM)
ReSeek
-0.156
9.672
19.5
29.328
Oct 1, 2025
Exact Match (EM)
Updated 22d ago
Evaluation Results
Method
Method
Links
Exact Match (EM)
ReSeek
Backbone=Qwen2.5-7b-In...
2025.10
37.7
ZeroSearch
Backbone=Qwen2.5-7b-In...
2025.10
34.6
Search-R1
Backbone=Qwen2.5-7b-In...
2025.10
34.2
ReSeek
Backbone=Qwen2.5-3b-In...
2025.10
31.2
Search-R1
Backbone=Qwen2.5-3b-In...
2025.10
28.8
ZeroSearch
Backbone=Qwen2.5-3b-In...
2025.10
28.1
RAG
Backbone=Qwen2.5-7b-In...
2025.10
26.7
R1
Backbone=Qwen2.5-7b-In...
2025.10
23.8
RAG
Backbone=Qwen2.5-3b-In...
2025.10
23.7
R1
Backbone=Qwen2.5-3b-In...
2025.10
19.6
Search-o1
Backbone=Qwen2.5-3b-In...
2025.10
18.7
Search-o1
Backbone=Qwen2.5-7b-In...
2025.10
18.3
Direct Inference
Backbone=Qwen2.5-7b-In...
2025.10
15.8
Direct Inference
Backbone=Qwen2.5-3b-In...
2025.10
11.8
CoT
Backbone=Qwen2.5-7b-In...
2025.10
9.3
CoT
Backbone=Qwen2.5-3b-In...
2025.10
1.3
Feedback
Search any
task
Search any
task