Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LongFact

Benchmarks

Task NameDataset NameSOTA ResultTrend
Uncertainty QuantificationLongFact
PCC-0.017
32
Factuality Hallucination EvaluationLongFact (test)
Response Score100
30
Factuality HallucinationLongFact
Facts Score23.5
30
Factual Text GenerationLongFact Objects
AURC0.426
14
Long-form generation factuality and uncertainty estimationLongFact (test)
Factuality Score91.5
14
Long-form Question AnsweringLongFact
VeriScore F175.9
14
Long-form factuality evaluationLongFact
Accuracy90.2
7
Factuality EvaluationLongFact
Precision38.6
6
Hallucination DetectionLongFact-Aug (test)
AUC0.9404
4
Showing 9 of 9 rows