Share your thoughts, 1 month free Claude Pro on usSee more

Faithfulness Detection on LatentAudit Llama-3-8B (evaluation set)

0.948AUROC

GPT-4o Judge

Updated 3mo ago

Evaluation Results

Method	Links
GPT-4o Judge 2026.04		0.948	0.881
LatentAudit 2026.04		0.942	0.869
INSIDE 2026.04		0.908	0.841
SAPLMA 2026.04		0.882	0.815
SelfCheckGPT 2026.04		0.871	0.804
Min-Perplexity 2026.04		0.722	0.655