Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General Question Answering on TriviaQA out-of-domain (val test)
Loading...
70.9
EM
Search-R2
16.404
30.552
44.7
58.848
Feb 3, 2026
EM
Updated 4d ago
Evaluation Results
Method
Method
Links
EM
Search-R2
Backbone=Qwen2.5-32B
2026.02
70.9
Search-R1
Backbone=Qwen2.5-32B
2026.02
68
Search-R2
Backbone=Qwen3-8B
2026.02
67.6
Search-R2
Backbone=Qwen2.5-7B
2026.02
65.9
Search-R1
Backbone=Qwen3-8B
2026.02
63.1
Rejection Sampling
Backbone=Qwen2.5-7B
2026.02
59.2
RAG
Backbone=Qwen2.5-7B
2026.02
58.5
Search-R1
Backbone=Qwen2.5-7B
2026.02
56
R1-base
Backbone=Qwen2.5-7B
2026.02
53.9
R1-instruct
Backbone=Qwen2.5-7B
2026.02
53.7
IRCoT
Backbone=Qwen2.5-7B
2026.02
47.8
Search-o1
Backbone=Qwen2.5-7B
2026.02
44.3
Direct Inference
Backbone=Qwen2.5-7B
2026.02
40.8
SFT
Backbone=Qwen2.5-7B
2026.02
35.4
CoT
Backbone=Qwen2.5-7B
2026.02
18.5
Feedback
Search any
task
Search any
task