Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Hallucination Detection on WebQuestions
Loading...
87.67
AUROC
DRIFT
72.018
76.0815
80.145
84.2085
Jan 20, 2026
AUROC
Updated 1mo ago
Evaluation Results
Method
Method
Links
AUROC
DRIFT
Model=Qwen-2.5-7B-Inst...
2026.01
87.67
DRIFT
Model=Qwen-2.5-7B-Inst...
2026.01
84.01
DRIFT
Model=Gemma-3-4b-it, I...
2026.01
83.11
DRIFT
Model=LLaMA 2 Chat 7B,...
2026.01
82.76
DRIFT
Model=LLaMA 2 Chat 7B,...
2026.01
80.79
Semantic Entropy
Model=Qwen-2.5-7B-Inst...
2026.01
80.71
HaloScope
Model=Qwen-2.5-7B-Inst...
2026.01
80.43
Semantic Entropy
Model=LLaMA 2 Chat 7B
2026.01
80.12
DRIFT
Model=Gemma-3-4b-it, I...
2026.01
79.63
HaloScope
Model=Qwen-2.5-7B-Inst...
2026.01
78.56
HaloScope
Model=LLaMA 2 Chat 7B,...
2026.01
77.03
Semantic Entropy
Model=Gemma-3-4b-it
2026.01
76.76
HaloScope
Model=Gemma-3-4b-it, I...
2026.01
76.7
HaloScope
Model=Gemma-3-4b-it, I...
2026.01
75.72
HaloScope
Model=LLaMA 2 Chat 7B,...
2026.01
72.62
Feedback
Search any
task
Search any
task