Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hallucination Detection on PubmedQA

88F1 Score

LatentAudit

4.48826.16947.8569.531Nov 12, 2025Dec 6, 2025Dec 30, 2025Jan 24, 2026Feb 17, 2026Mar 13, 2026Apr 7, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2026.04
88-94.8
2026.04
87-94.8
2026.04
87-94.8
2026.04
87-94.8
2026.04
87-94.8
2026.04
87-94.8
2026.04
87-94.2
2026.04
86-93.1
2026.04
86-93.8
2026.04
85-92.5
2026.04
84-90.8
2026.04
83-90.3
2026.04
83-90.5
2026.04
82-89.9
2026.04
82-89.5
2026.04
82-88.2
2025.11
81.781-
2026.04
81-87.1
2026.04
81-87.8
2026.04
81-88
2026.04
80-86.2
2026.04
80-86.9
2026.04
80-87.2
2026.04
80-87
2026.04
79-85.5
2026.04
79-85.8
76.479-
7177.5-
2026.04
70-72.2
2026.04
69-71.8
2026.04
69-72
68.774.5-
2026.04
68-71.5
2026.04
68-71.2
16.554.5-
7.752-