Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Truthful QA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Truthful QATruthful QA
Accuracy68.4
83
Question AnsweringTruthful-QA
Info Accuracy99.2
27
PersonalizationTruthful QA
Creative Score (ArmoRM)56
18
Hallucination DetectionTruthful-QA
Accuracy74.17
17
Test-Time PersonalizationTruthful QA
Creative Win Rate99.6
15
CoT faithfulness detectionTruthful-QA
Accuracy78
12
Question AnsweringTruthful QA
LIS3.1838
10
Showing 7 of 7 rows