Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Token-level Hallucination Detection on AIME 2025 (AUROC/AUPRC)
Loading...
89.47
AUROC
TOKENHD-8B
63.3452
70.1276
76.91
83.6924
May 12, 2026
AUROC
AUPRC
Updated 21d ago
Evaluation Results
Method
Method
Links
AUROC
AUPRC
TOKENHD-8B
2026.05
89.47
80.14
o4-mini
2026.05
82.26
60.88
GPT-4.1
2026.05
75.43
49.87
QwQ-32B
2026.05
70.82
52.12
R1-Qwen3-8B
2026.05
64.35
46.63
Feedback
Search any
task
Search any
task