Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Token-level Hallucination Detection on AIME 2025

6.46S_incor

Qwen3-0.6B

3.202825.188947.17569.1611May 12, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
6.4695.84
2026.05
18.76100
2026.05
40.14100
2026.05
44.75100
2026.05
48.9299.72
2026.05
56.1899.31
2026.05
56.24100
2026.05
64.3489.31
2026.05
67.0296.41
2026.05
67.3292.66
2026.05
68.8895.84
2026.05
76.09100
2026.05
85.88100
2026.05
87.89100