Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Hallucination Detection on LLaMa 1 (test)
Loading...
0.894
AUROC
EigenTrack
0.7328
0.77465
0.8165
0.85835
Jan 24, 2026
AUROC
Updated 4d ago
Evaluation Results
Method
Method
Links
AUROC
EigenTrack
Model Scale=7B
2026.01
0.894
LapEigvals
Model Scale=7B
2026.01
0.871
EigenTrack
Model Scale=3B
2026.01
0.861
HaloScope
Model Scale=7B
2026.01
0.861
EigenTrack
Model Scale=1B
2026.01
0.842
INSIDE
Model Scale=3B
2026.01
0.831
HaloScope
Model Scale=3B
2026.01
0.827
HaloScope
Model Scale=1B
2026.01
0.82
LapEigvals
Model Scale=3B
2026.01
0.819
INSIDE
Model Scale=7B
2026.01
0.81
SelfCheckGPT
Model Scale=7B
2026.01
0.809
SelfCheckGPT
Model Scale=3B
2026.01
0.804
LapEigvals
Model Scale=1B
2026.01
0.785
INSIDE
Model Scale=1B
2026.01
0.753
SelfCheckGPT
Model Scale=1B
2026.01
0.739
Feedback
Search any
task
Search any
task