Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Short-form QA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Factuality DetectionShort-form QA (Average of NQ, PopQA, TriviaQA, SimpleQA) (test)
PR-AUC71.1
60
Short-form Question AnsweringShort-form QA Aggregate (Avg.) (test)
EM35.93
5
Faithfulness EvaluationShort-Form QA
Faithfulness Correlation0.82
2
Showing 3 of 3 rows