Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Faithfulness Detection on LatentAudit Llama-3-8B (evaluation set)
Loading...
0.948
AUROC
GPT-4o Judge
0.71296
0.77398
0.835
0.89602
Apr 7, 2026
AUROC
F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
AUROC
F1 Score
GPT-4o Judge
Latency (ms)=∼5,300
2026.04
0.948
0.881
LatentAudit
Latency (ms)=0.77 (+11...
2026.04
0.942
0.869
INSIDE
Latency (ms)=∼3.8
2026.04
0.908
0.841
SAPLMA
Latency (ms)=∼1.5
2026.04
0.882
0.815
SelfCheckGPT
Latency (ms)=∼28,500
2026.04
0.871
0.804
Min-Perplexity
Latency (ms)=0.0
2026.04
0.722
0.655
Feedback
Search any
task
Search any
task