Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Hallucination Detection on WikiBio GPT-3.5-Turbo-Instruct (test)
Loading...
92.5
AUC-PR (Nonfactual)
SelfCheckGPT
81.5384
84.3842
87.23
90.0758
Feb 25, 2024
AUC-PR (Nonfactual)
AUC-PR (Factual)
Balanced Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
AUC-PR (Nonfactual)
AUC-PR (Factual)
Balanced Accuracy
SelfCheckGPT
Variant=w/NLI
2024.02
92.5
58.47
70.55
CEG
top-k=6
2024.02
92.31
70.24
77.59
SelfCheckGPT
Variant=w/Prompt
2024.02
91.16
68.37
72.64
Focus
Backbone=LLaMA-65B
2024.02
89.94
64.9
74.08
Focus
Backbone=LLaMA-30B
2024.02
89.79
65.69
73.64
HalluDetector
CM=28, CFA=96
2024.02
86.45
61.96
74.82
HalluDetector
CM=14, CFA=21
2024.02
82.42
57.01
70.54
SelfCheckGPT
Variant=w/BERTScore
2024.02
81.96
44.23
59.31
Feedback
Search any
task
Search any
task