Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Step-level hallucination detection on PRM800K
Loading...
99.8
AUROC
Student
43.64
58.22
72.8
87.38
May 13, 2026
AUROC
Updated 20d ago
Evaluation Results
Method
Method
Links
AUROC
Student
Deployability=deployable
2026.05
99.8
Teacher
Deployability=non-depl...
2026.05
98.5
Linear Probe
2026.05
91.3
TL-Entropy
2026.05
54.5
LLM-Check
Mechanism=attention
2026.05
48
TL-Perplexity
2026.05
45.8
Feedback
Search any
task
Search any
task