Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LongFact

Benchmarks

Task NameDataset NameSOTA ResultTrend
Factuality Hallucination EvaluationLongFact (test)
Response Score100
30
Factuality HallucinationLongFact
Facts Score23.5
30
Factual Text GenerationLongFact Objects
AURC0.426
14
Long-form generation factuality and uncertainty estimationLongFact (test)
Factuality Score91.5
14
Long-form Question AnsweringLongFact
VeriScore F175.9
14
Hallucination DetectionLongFact-Aug (test)
AUC0.9404
4
Showing 6 of 6 rows