Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Question Answering on HaluEval
Loading...
68
EM
Token-Guard
-2.72
15.64
34
52.36
Jan 29, 2026
EM
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
EM
F1 Score
Token-Guard
Backbone=Meta-Llama-3....
2026.01
68
78.54
Guided Decoding
Backbone=Qwen3-8B
2026.01
62
69.88
Token-Guard
Backbone=Qwen3-8B
2026.01
51
74.15
BaseModel
Backbone=Qwen3-8B
2026.01
43
59.83
Guided Decoding
Backbone=Meta-Llama-3....
2026.01
42
57.41
Chain-of-Thoughts
Backbone=Meta-Llama-3....
2026.01
40
55.32
Tree-of-Thought
Backbone=Qwen3-8B
2026.01
39
57.33
Tree-of-Thought
Backbone=Meta-Llama-3....
2026.01
38
56.02
Chain-of-Thoughts
Backbone=Qwen3-8B
2026.01
34
53.21
BaseModel
Backbone=Meta-Llama-3....
2026.01
32
42.16
Predictive Decoding
Backbone=Meta-Llama-3....
2026.01
22
38
Predictive Decoding
Backbone=Qwen3-8B
2026.01
0
21.29
Feedback
Search any
task
Search any
task