Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Prefill-stage hallucination risk detection on Benchmark-500 Strict Consensus Pvote = 1.0 vs. Clean

0.6939AUROC (Mean)

Risk-Cos

0.3370760.4297130.522350.614987Mar 20, 2026
Updated 27d ago

Evaluation Results

MethodLinks
2026.03
0.69390.58
2026.03
0.6850.57
2026.03
0.53230.44
2026.03
0.35080.25