Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Single-Hop Question Answering on PopQA 2018 Wikipedia dump (dev)
Loading...
47.2
Accuracy
MR-Search
9.344
19.172
29
38.828
Mar 11, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
MR-Search
Backbone=Qwen2.5-7b
2026.03
47.2
MR-Search
Backbone=Qwen2.5-3b
2026.03
46
Search-R1
Backbone=Qwen2.5-3b
2026.03
45.6
StepResearch
Backbone=Qwen2.5-3b
2026.03
45.6
Search-R1
Backbone=Qwen2.5-7b
2026.03
44.9
PPRM
Backbone=Qwen2.5-7b
2026.03
43.7
StepResearch
Backbone=Qwen2.5-7b
2026.03
43.1
ReSearch
Backbone=Qwen2.5-3b
2026.03
43
PPRM
Backbone=Qwen2.5-3b
2026.03
41.1
ReSearch
Backbone=Qwen2.5-7b
2026.03
39.1
Search-o1
Backbone=Qwen2.5-3b
2026.03
26.2
Direct Inference
Backbone=Qwen2.5-7b
2026.03
14
Search-o1
Backbone=Qwen2.5-7b
2026.03
13.1
Direct Inference
Backbone=Qwen2.5-3b
2026.03
10.8
Feedback
Search any
task
Search any
task