Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Token-level Hallucination Detection on AIME 2024

9.29S_incor Score

Qwen3-0.6B

6.301226.475646.6566.8244May 12, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
9.29100
2026.05
18.99100
2026.05
34.51100
2026.05
36.78100
2026.05
47.03100
2026.05
50.299.37
2026.05
50.59100
2026.05
59.5796.79
2026.05
61.4190.12
2026.05
63.6298.44
2026.05
63.796.62
2026.05
67.67100
2026.05
80.44100
2026.05
84.01100