Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Prefill-stage hallucination risk detection on Benchmark-500 Relaxed Consensus (Pvote ≥ 0.8)

0.6957AUROC (Mean)

Risk-Cos

0.3112120.4110310.510850.610669Mar 20, 2026
Updated 27d ago

Evaluation Results

MethodLinks
2026.03
0.69570.61
2026.03
0.6730.59
2026.03
0.5020.42
2026.03
0.3260.25