Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Prefill-stage hallucination risk detection on Benchmark-500 Relaxed Consensus (Pvote ≥ 0.8)
Loading...
0.6957
AUROC (Mean)
Risk-Cos
0.311212
0.411031
0.51085
0.610669
Mar 20, 2026
AUROC (Mean)
AUROC (95% CI Lower Bound)
Updated 27d ago
Evaluation Results
Method
Method
Links
AUROC (Mean)
AUROC (95% CI Lower Bound)
Risk-Cos
generation_mode=prefil...
2026.03
0.6957
0.61
Risk-Margin
generation_mode=prefil...
2026.03
0.673
0.59
Risk-Entropy
generation_mode=prefil...
2026.03
0.502
0.42
Risk-Loss
generation_mode=prefil...
2026.03
0.326
0.25
Feedback
Search any
task
Search any
task