Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Grounded Text Generation on HaluEval
Loading...
72.66
F1 Score
Token-Guard
33.8056
43.8928
53.98
64.0672
Jan 29, 2026
F1 Score
BLEU Score
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
BLEU Score
Token-Guard
Backbone=13B
2026.01
72.66
68.81
Token-Guard
Backbone=3B
2026.01
68.21
63.83
Guided Decoding
Backbone=3B
2026.01
63.47
59.11
Guided Decoding
Backbone=13B
2026.01
63.42
58.72
Tree-of-Thought
Backbone=13B
2026.01
62.2
58.07
Tree-of-Thought
Backbone=3B
2026.01
59.22
55.11
Predictive Decoding
Backbone=3B
2026.01
57.55
53.97
Chain-of-Thoughts
Backbone=3B
2026.01
50.88
48.44
Chain-of-Thoughts
Backbone=13B
2026.01
49.43
44.01
BaseModel
Backbone=3B
2026.01
42.95
39.95
BaseModel
Backbone=13B
2026.01
35.3
31.4
Feedback
Search any
task
Search any
task