Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Faithfulness Detection on LatentAudit Qwen-2.5-7B (evaluation)
Loading...
94.5
AUROC
GPT-4o Judge
70.892
77.021
83.15
89.279
Apr 7, 2026
AUROC
F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
AUROC
F1 Score
GPT-4o Judge
Latency (ms)=∼5,300
2026.04
94.5
87.6
LatentAudit
Latency (ms)=0.77 (+11...
2026.04
93.8
86.2
INSIDE
Latency (ms)=∼3.8
2026.04
90.1
83.2
SAPLMA
Latency (ms)=∼1.5
2026.04
87.6
80.8
SelfCheckGPT
Latency (ms)=∼28,500
2026.04
86.5
79.8
Min-Perplexity
Latency (ms)=0.0
2026.04
71.8
65
Feedback
Search any
task
Search any
task