Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Hallucination Detection on CNN/DM
Loading...
76.14
AUROC
FEPoID
53.884
59.662
65.44
71.218
May 25, 2026
AUROC
Updated 7d ago
Evaluation Results
Method
Method
Links
AUROC
FEPoID
Model=Mistral-7B-Instr...
2026.05
76.14
Curvature
Model=Mistral-7B-Instr...
2026.05
73.19
ID
Model=Mistral-7B-Instr...
2026.05
71.85
RGN
Model=Mistral-7B-Instr...
2026.05
70.31
Val Loss
Model=Mistral-7B-Instr...
2026.05
69.38
RankMe
Model=Mistral-7B-Instr...
2026.05
68.69
SNR
Model=Mistral-7B-Instr...
2026.05
68.11
FEPoID
Model=LlaMA-3.1-8B-Ins...
2026.05
59.95
Curvature
Model=LlaMA-3.1-8B-Ins...
2026.05
59.22
ID
Model=LlaMA-3.1-8B-Ins...
2026.05
59.18
Val Loss
Model=LlaMA-3.1-8B-Ins...
2026.05
58.59
RGN
Model=LlaMA-3.1-8B-Ins...
2026.05
58.21
RankMe
Model=LlaMA-3.1-8B-Ins...
2026.05
57.74
SNR
Model=LlaMA-3.1-8B-Ins...
2026.05
54.74
Feedback
Search any
task
Search any
task