Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FreshQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Factuality-based Question AnsweringFreshQA 2025/11/24
C44
40
Question AnsweringFreshQA (out-of-domain)
Precision67.2
12
Question AnsweringFreshQA (train test)
BLEU28.37
4
Question AnsweringFreshQA
EM26.6
3
Temporal Question AnsweringFreshQA
AUROC0.657
2
Factual ReasoningFreshQA v2
Baseline Wins16
2
Showing 6 of 6 rows