Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Grounded Text Generation on DROP history
Loading...
51.17
F1
Token-Guard
30.8172
36.1011
41.385
46.6689
Jan 29, 2026
F1
BLEU
Updated 4d ago
Evaluation Results
Method
Method
Links
F1
BLEU
Token-Guard
Backbone=13B
2026.01
51.17
47.64
BaseModel
Backbone=13B
2026.01
46.32
42.74
Tree-of-Thought
Backbone=13B
2026.01
43.97
38.25
Token-Guard
Backbone=3B
2026.01
43.09
40.57
BaseModel
Backbone=3B
2026.01
42.21
39.31
Guided Decoding
Backbone=13B
2026.01
40.41
36.71
Tree-of-Thought
Backbone=3B
2026.01
39.84
36.14
Predictive Decoding
Backbone=3B
2026.01
34.19
31.07
Chain-of-Thoughts
Backbone=13B
2026.01
33.9
30.06
Guided Decoding
Backbone=3B
2026.01
32.6
28.83
Chain-of-Thoughts
Backbone=3B
2026.01
31.6
26.04
Feedback
Search any
task
Search any
task