Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Question Answering on NQ (Natural Questions) in-domain (val/test)
Loading...
50.9
Exact Match
Search-R2
2.956
15.403
27.85
40.297
Feb 3, 2026
Exact Match
Updated 4d ago
Evaluation Results
Method
Method
Links
Exact Match
Search-R2
Backbone=Qwen2.5-32B
2026.02
50.9
Search-R2
Backbone=Qwen3-8B
2026.02
47.7
Search-R1
Backbone=Qwen2.5-32B
2026.02
47.6
Search-R1
Backbone=Qwen3-8B
2026.02
44
Search-R2
Backbone=Qwen2.5-7B
2026.02
39.9
Search-R1
Backbone=Qwen2.5-7B
2026.02
39.5
Rejection Sampling
Backbone=Qwen2.5-7B
2026.02
36
RAG
Backbone=Qwen2.5-7B
2026.02
34.9
SFT
Backbone=Qwen2.5-7B
2026.02
31.8
R1-base
Backbone=Qwen2.5-7B
2026.02
29.7
R1-instruct
Backbone=Qwen2.5-7B
2026.02
27
IRCoT
Backbone=Qwen2.5-7B
2026.02
22.4
Search-o1
Backbone=Qwen2.5-7B
2026.02
15.1
Direct Inference
Backbone=Qwen2.5-7B
2026.02
13.4
CoT
Backbone=Qwen2.5-7B
2026.02
4.8
Feedback
Search any
task
Search any
task