Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BIOS

Benchmarks

Task NameDataset NameSOTA ResultTrend
Text ClassificationBIOS
Task Accuracy84.6
39
FactualityBIOS
Factuality56
28
Confidence Estimation (Iterative Tagging)Bios
Brier Score (BS)7.5
17
Long-form generation factuality and uncertainty estimationBios (test)
FA71.4
14
Factual Precision EvaluationBios
FACTSCORE83
10
ClassificationBios (test)
Accuracy80.1
7
Attribute-conditional generationBIOS
Control Accuracy99.2
5
Confidence Estimation (Freeform Tagging)Bios
Brier Score (BS)9.2
3
Distribution Inference Attack mitigationBios sex (M → F)
Adversarial Gap0.9
2
Showing 9 of 9 rows