Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FactScore

Benchmarks

Task NameDataset NameSOTA ResultTrend
Honesty EvaluationFActScore v1.0
Score47.3
20
Claim-level Uncertainty QuantificationFactScore English (test)
ROC-AUC71
20
Fact-checking of atomic claimsFactScore English
PR-AUC0.34
20
Knowledge Graph Factuality EvaluationFActScore
FActScore84
16
Long-form Factuality VerificationFactScore
Precision@165.41
15
Factual Text GenerationFactScore
AURC0.7345
14
Factuality GenerationFActScore (test)
Number of Facts20.4
12
Factuality EvaluationFactScore (unlabeled)
US (%)76.4
10
Factuality EvaluationFactScore (labeled)
LS Score (%)64.8
10
Long-form text generationFactScore
Response Completeness100
9
Long-form Factuality CalibrationFactScore
ECE0.076
8
Consistency Assessment of Generated Reference PointsFactScore LLM-based evaluation
Score86.36
6
Factuality EvaluationFActScore
Pairwise Score69.3
3
Knowledge Graph Factuality EvaluationFActScore Context and General truth
FActScore80.2
2
Knowledge Graph Factuality EvaluationFActScore* Context only
FActScore76.9
2
Showing 15 of 15 rows