Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LatentAudit

Benchmarks

Task NameDataset NameSOTA ResultTrend
Faithfulness DetectionLatentAudit Mistral-7B (evaluation)
AUROC94
6
Faithfulness DetectionLatentAudit Qwen-2.5-7B (evaluation)
AUROC94.5
6
Faithfulness DetectionLatentAudit Llama-3-8B (evaluation set)
AUROC0.948
6
Showing 3 of 3 rows