Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Knowledge Acquisition on Wikipedia (test) (In-distribution)
Loading...
91
Accuracy (strict)
Oracle RAG
-3.64
20.93
45.5
70.07
Jan 27, 2026
Accuracy (strict)
Accuracy (lenient)
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy (strict)
Accuracy (lenient)
Oracle RAG
Backbone=Qwen2.5-7B-In...
2026.01
91
100
SDFT
Backbone=Qwen2.5-7B-In...
2026.01
89
100
SFT
Backbone=Qwen2.5-7B-In...
2026.01
80
95
CPT
Backbone=Qwen2.5-7B-In...
2026.01
9
37
Base
Backbone=Qwen2.5-7B-In...
2026.01
0
0
Feedback
Search any
task
Search any
task