| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Prefill-stage hallucination risk detection | Benchmark-500 Relaxed Consensus (Pvote ≥ 0.8) | AUROC (Mean)0.6957 | 4 | |
| Prefill-stage hallucination risk detection | Benchmark-500 Strict Consensus Pvote = 1.0 vs. Clean | AUROC (Mean)0.6939 | 4 |