Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Open-ended Generation on NQ-Swap
Loading...
44.91
EM
Baseline
31.9516
35.3158
38.68
42.0442
Feb 10, 2026
EM
F1
Truthfulness
Updated 4d ago
Evaluation Results
Method
Method
Links
EM
F1
Truthfulness
Baseline
Model=Qwen-2.5-14B
2026.02
44.91
0.5415
58.6
Diver
Model=Qwen-2.5-14B
2026.02
42
0.5128
57.1
CoCoASIG
Model=Qwen-2.5-14B, Mo...
2026.02
41.1
0.5215
59.7
DoLa
Model=Qwen-2.5-14B
2026.02
32.45
0.4337
47.35
Feedback
Search any
task
Search any
task