Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Error Detection on TruthfulQA (full)
Loading...
0.599
AUROC
B1 mean entropy
0.53972
0.55511
0.5705
0.58589
Mar 25, 2026
AUROC
95% CI (AUROC)
Cost Multiplier
Updated 23d ago
Evaluation Results
Method
Method
Links
AUROC
95% CI (AUROC)
Cost Multiplier
B1 mean entropy
description=Baseline f...
2026.03
0.599
0.559
-
SelfCheck-Emb
Sample size (k)=5
2026.03
0.588
0.547
6
SE-Jaccard
Sample size (N)=10
2026.03
0.548
0.51
11
SE-Embedding
Sample size (N)=10
2026.03
0.542
0.513
11
Feedback
Search any
task
Search any
task