Share your thoughts, 1 month free Claude Pro on usSee more

Faithfulness Detection on LatentAudit Qwen-2.5-7B (evaluation)

94.5AUROC

GPT-4o Judge

Updated 3mo ago

Evaluation Results

Method	Links
GPT-4o Judge 2026.04		94.5	87.6
LatentAudit 2026.04		93.8	86.2
INSIDE 2026.04		90.1	83.2
SAPLMA 2026.04		87.6	80.8
SelfCheckGPT 2026.04		86.5	79.8
Min-Perplexity 2026.04		71.8	65