Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PopQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringPopQA
Accuracy68.4
186
Hallucination DetectionPopQA
AUC96.18
88
Question AnsweringPopQA
EM51.6
80
Uncertainty QuantificationPopQA 500 randomly sampled queries (test)
AUROC0.8709
70
Single-Hop Question AnsweringPopQA
EM61.6
55
Question AnsweringPopQA
Score43.93
50
Question AnsweringPopQA (test)
Accuracy65.4
39
General Question AnsweringPopQA
EM44.8
36
Factual Knowledge EvaluationPopQA
Accuracy18
32
AbstentionPopQA (test)
AUARC66.06
25
AbstentionPopQA
Abstain Accuracy81.6
25
Question AnsweringPopQA longtail
EM45.96
23
Single-hop Question AnsweringPopQA (test)
Accuracy44.2
21
General Question AnsweringPopQA
Accuracy48.8
18
Question AnsweringPopQA
EM34.2
17
Question AnsweringPopQA (Frequent)
Exact Match (EM)52.7
16
Question AnsweringPopQA Infrequent
Exact Match Accuracy42.9
16
Question AnsweringPopQA
Accuracy41.3
16
General QA VerificationPopQA
P@190.14
16
Uncertainty EstimationPopQA
A183.09
16
Question AnsweringPopQA v1.0 (test)
A183.07
16
General Question AnsweringPopQA out-of-domain (val test)
Exact Match (EM)50.1
15
RetrievalPOPQA Long-tail
Recall@1074.5
14
Question AnsweringPopQA
F1 Score59.9
14
Open-domain Question AnsweringPopQA
Accuracy65.7
11
Showing 25 of 57 rows