Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Error Detection on TruthfulQA 200-question subset
Loading...
0.512
AUROC
SE-NLI
0.50056
0.50353
0.5065
0.50947
Mar 25, 2026
AUROC
95% CI
Cost (NLI Steps)
Updated 23d ago
Evaluation Results
Method
Method
Links
AUROC
95% CI
Cost (NLI Steps)
SE-NLI
Backbone=DeBERTa-base,...
2026.03
0.512
0.421
11
SE-NLI
Backbone=DeBERTa-large...
2026.03
0.511
0.419
11
SE-NLI
Backbone=DeBERTa-xsmal...
2026.03
0.501
0.404
11
Feedback
Search any
task
Search any
task