Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AMBIGQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Error PredictionAmbigQA (val)
PRR69.8
90
Question AnsweringAmbigQA
Accuracy (One Intent)69.8
36
Correctness DetectionNon-AmbigQA
AUROC79.32
20
Question AnsweringAmbigQA
Cover EM60
18
Uncertainty EstimationAmbigQA
AUROC78.5
16
Ambiguous Question AnsweringAmbigQA (test)
Accuracy54.43
13
Ambiguity DetectionAmbigQA
F1 Score70.38
11
Question AnsweringAmbigQA
EM61.3
11
Disambiguation and completenessAmbigQA
Personalization Bias0.113
9
Question Answering with ClarificationAmbigQA Unambiguous queries (dev)
Reward42.05
8
Question Answering with ClarificationAmbigQA Ambiguous queries (dev)
Reward15.81
8
Question AnsweringAmbigQA (test)
Correctness (%)65.8
7
Question AnsweringAmbigQA
Accuracy59.8
7
Open-Domain QAAmbigQA Nq=300
Acc0.473
6
Question AnsweringAmbigQA
Helpfulness4.96
5
Question ClarificationAmbigQA High Aleatoric Uncertainty Superset (top 20% examples)
Clarification Rate43.87
4
Question AnsweringAmbigQA (sampled)
Accuracy65.5
4
Multi-answer Question AnsweringAMBIGQA (test)
F1 (All Questions)46.2
3
Multi-answer Question AnsweringAMBIGQA (dev)
F1 (all questions)52.1
3
Showing 19 of 19 rows