Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Single-Hop Question Answering on NQ (Natural Questions) 2018 Wikipedia dump (dev)
Loading...
50.2
Accuracy
MR-Search
9.016
19.708
30.4
41.092
Mar 11, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
MR-Search
Backbone=Qwen2.5-7b
2026.03
50.2
MR-Search
Backbone=Qwen2.5-3b
2026.03
47.7
StepResearch
Backbone=Qwen2.5-7b
2026.03
47.3
Search-R1
Backbone=Qwen2.5-3b
2026.03
46.2
Search-R1
Backbone=Qwen2.5-7b
2026.03
45.9
PPRM
Backbone=Qwen2.5-7b
2026.03
45.8
StepResearch
Backbone=Qwen2.5-3b
2026.03
44.6
ReSearch
Backbone=Qwen2.5-3b
2026.03
42.7
PPRM
Backbone=Qwen2.5-3b
2026.03
42.3
ReSearch
Backbone=Qwen2.5-7b
2026.03
36.6
Search-o1
Backbone=Qwen2.5-3b
2026.03
23.8
Search-o1
Backbone=Qwen2.5-7b
2026.03
15.1
Direct Inference
Backbone=Qwen2.5-7b
2026.03
13.4
Direct Inference
Backbone=Qwen2.5-3b
2026.03
10.6
Feedback
Search any
task
Search any
task