Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TRUE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Factual Consistency EvaluationTRUE benchmark
PAWS (AUC-ROC)98.4
37
Factual Consistency EvaluationTRUE 1.0 (test)
Frank AUC91.5
20
Factual Grounding EvaluationTRUE (sampled 100 entries from each of 11 datasets)
ROC-AUC0.86
3
Showing 3 of 3 rows