Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reasoning on Natural
Loading...
70.96
Accuracy
HALLUGUARD
54.6112
58.8556
63.1
67.3444
Jan 26, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
HALLUGUARD
Backbone=Llama3.1-8B
2026.01
70.96
Energy
Backbone=Llama3.1-8B
2026.01
68.59
MIND
Backbone=Llama3.1-8B
2026.01
68.32
P(true)
Backbone=Llama3.1-8B
2026.01
68.16
Semantic Ent.
Backbone=Llama3.1-8B
2026.01
68.1
LN Entropy
Backbone=Llama3.1-8B
2026.01
68.04
FActScore
Backbone=Llama3.1-8B
2026.01
67.74
Perplexity
Backbone=Llama3.1-8B
2026.01
67.51
Inside
Backbone=Llama3.1-8B
2026.01
67.42
RACE
Backbone=Llama3.1-8B
2026.01
66.9
SelfCheck GPT
Backbone=Llama3.1-8B
2026.01
65.68
IO Prompt
Backbone=Llama3.1-8B
2026.01
55.24
Feedback
Search any
task
Search any
task