Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Text Generation on TruthfulQA
Loading...
15.5
BLEU-4
Factual SFT
9.676
11.188
12.7
14.212
Jan 6, 2026
BLEU-4
ROUGE-L
Updated 4d ago
Evaluation Results
Method
Method
Links
BLEU-4
ROUGE-L
Factual SFT
Backbone=Qwen2.5-14B
2026.01
15.5
38.3
SFT
Backbone=Qwen2.5-14B
2026.01
14.2
36.3
Factual SFT + Standard DPO
Backbone=Qwen2.5-14B
2026.01
12.4
34.4
Base Model
Backbone=Qwen2.5-14B
2026.01
10.6
31.5
Standard DPO
Backbone=Qwen2.5-14B
2026.01
10.5
31.8
Factual SFT + F-DPO
Backbone=Qwen2.5-14B
2026.01
10.2
31.8
F-DPO
Backbone=Qwen2.5-14B
2026.01
9.9
30.6
Feedback
Search any
task
Search any
task