Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Token-level Hallucination Detection on AIME 2024 (AUROC/AUPRC)
Loading...
87.39
AUROC
TOKENHD-8B
63.938
70.0265
76.115
82.2035
May 12, 2026
AUROC
AUPRC
Updated 21d ago
Evaluation Results
Method
Method
Links
AUROC
AUPRC
TOKENHD-8B
2026.05
87.39
73.59
o4-mini
2026.05
78.8
59.08
GPT-4.1
2026.05
72.06
44.26
QwQ-32B
2026.05
69.04
49.19
R1-Qwen3-8B
2026.05
64.84
45.62
Feedback
Search any
task
Search any
task