Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Hallucination Detection on PubMedQA (AUROC)
Loading...
94.8
AUROC
LatentAudit
92.408
93.029
93.65
94.271
Apr 7, 2026
AUROC
Updated 1mo ago
Evaluation Results
Method
Method
Links
AUROC
LatentAudit
Model Backbone=Qwen-3...
2026.04
94.8
LatentAudit
Model Backbone=Llama-3...
2026.04
94.2
LatentAudit
Model Backbone=Qwen-2....
2026.04
93.8
LatentAudit
Model Backbone=Llama-2...
2026.04
93.1
LatentAudit
Model Backbone=Mistral...
2026.04
92.5
Feedback
Search any
task
Search any
task