Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LongFact

Benchmarks

Task NameDataset NameSOTA ResultTrend
Uncertainty QuantificationLongFact
PCC-0.017
32
Factuality Hallucination EvaluationLongFact (test)
Response Score100
30
Factuality HallucinationLongFact
Facts Score23.5
30
Long-form FactualityLongFact
R@6478.4
18
Long-form Factual GenerationLongFact
Fact Recall (FR) - Science84.2
14
Long-form Retrieval-Augmented GenerationLongFact
Information Density (Sci.)247.3
14
Factual Text GenerationLongFact Objects
AURC0.426
14
Long-form generation factuality and uncertainty estimationLongFact (test)
Factuality Score91.5
14
Long-form Question AnsweringLongFact
VeriScore F175.9
14
Long-form factuality evaluationLongFact
Accuracy90.2
7
Claim-level specificity controlLongFact full
Claims Emitted11,705
6
Factuality EvaluationLongFact
Precision38.6
6
Hallucination DetectionLongFact-Aug (test)
AUC0.9404
4
Claim-level specificity controlLongFact pilot
Claims Emitted724
3
Showing 14 of 14 rows